Study of Extended Euclidean and Itoh-Tsujii Algorithms in GF(2m) using polynomial bases

(1)

Study of Extended Euclidean and Itoh-Tsujii Algorithms in GF (2m_{) using}

polynomial bases

by

Fan Zhou

B.Eng., Zhejiang University, 2013

A Report Submitted in Partial Fulfillment of the Requirements for the Degree of

MASTER OF ENGINEERING

in the Department of Electrical and Computer Engineering

c

Fan Zhou, 2018 University of Victoria

(2)

Study of Extended Euclidean and Itoh-Tsujii Algorithms in GF (2m_{) using}

polynomial bases

by

Fan Zhou

B.Eng., Zhejiang University, 2013

Supervisory Committee

Dr. Fayez Gebali, Supervisor

(Department of Electrical and Computer Engineering)

Dr. Watheq El-Kharashi, Departmental Member (Department of Electrical and Computer Engineering)

(3)

iii

ABSTRACT

Finite field arithmetic is important for the field of information security. The inversion operation consumes most of the time and resources among all finite field arithmetic operations. In this report, two main classes of algorithms for inversion are studied. The first class of inverters is Extended Euclidean based inverters. Extended Euclidean Algorithm is an extension of Euclidean algorithm that computes the greatest common divisor. The other class of inverters is based on Fermat’s little theorem. This class of inverters is also called multiplicative based inverters, because, in these algorithms, the inversion is performed by a sequence of multiplication and squaring. This report represents a literature review of inversion algorithm and implements a multiplicative based inverter and an Extended Euclidean based inverter in MATLAB. The experi-mental results show that inverters based on Extended Euclidean Algorithm are more efficient than inverters based on Fermat’s little theorem.

(4)

List of Tables

Table 2.1 An example of binary polynomial division . . . 8 Table 2.2 An example of EEA . . . 10 Table 3.1 Inverse of a ∈ GF (2233_{) using an addition chain [1] . . . .} ₁₅

Table 4.1 Execution time of EEA and Itoh-Tsujii Algorithms on a quad-core processor . . . 18 Table 4.2 Execution time of EEA and Itoh-Tsujii Algorithms on a dual-core

(7)

vii

List of Figures

Figure 1.1 ECC Arithmetic Architecture . . . 2 Figure 3.1 Flowchart of Itoh-Tsujii Algorithm . . . 14

(8)

List of Acronyms

EEA Extended Euclidean Algorithm ECC Elliptic Curve Cryptography FLT Fermat’s Little Theorem GCD Greatest Common Divisor SM scalar multiplication

(9)

ix

ACKNOWLEDGEMENTS

I would like to thank my supervisor Dr. Gebali, who provided my valuable gudance and advice throughout my graduate study. Besides my supervisor, I would like to thank Ibrahim Hazmi for helping me improve my project. My gratitude also goes to my parents and my roommate who constantly support me when I am in need.

(10)

DEDICATION

(11)

Chapter 1 Introduction

1.1 Background

Elliptic Curve Cryptography (ECC) is a public-key cryptosystem based on the alge-braic structure of elliptic curves over finite fields, which can be used to create faster and more efficient cryptographic schemes.

The hierarchy of the computations involved in the implementation of ECC cryptosys-tems is in a pyramid of four levels of operations. Finite field or modular arithmetic is the foundation of the pyramid, as it is the basic building block of elliptic curve point addition and point doubling. Whereas the scalar multiplication (SM) is per-formed by repeating point addition and point doubling operations and is used by all ECC cryptographic protocols. Figure 1.1 illustrates the arithmetic architecture of SM computational processes.

An elliptic curve E(K) over a field K is defined by an equation [2]:

y2+ a1xy + a3y = x3+ a22x + a4x + a6 (1.1)

where a1, a2, a3, a4, a6 ∈ K, and the discriminant of E is 4 6= 0. In the binary field,

(12)

y3+ xy = x3+ ax2+ b (1.2) where a, b ∈ K.

Figure 1.1: ECC Arithmetic Architecture [3]

1.2 Preliminaries: Binary Finite Field Arithmetic

The finite field GF (2m_{) of order 2}m _{is called binary finite field. The element a(x) ∈}

GF (2m_{) can be expressed as a binary polynomial of degree m − 1 [2]:}

a(x) = am−1xm−1+ am−2xm−2+ · · · + a2x2+ a1x1+ a0 (1.3)

where ai = 0 or 1.

A polynomial f (x) of degree m is said to be irreducible in GF (2m) if there does not exist two polynomials g(x) and h(x) of lesser degree in GF (2m_{) such that f (x) =}

g(x)h(x). In polynomial arithmetic, as the coefficients ai of the polynomial can be

either 0 or 1, an irreducible polynomial f (x) is used to reduce the result of any operation if its degree is greater than m − 1. For instance, the operations defined in field GF (25_{) are on an irreducible polynomial f (x) = x}5 _{+ x}2_{+ 1.}

(13)

3

Computing point multiplication requires point doubling and point addition, which can be implemented using four basic operations, namely, addition, subtraction, mul-tiplication and division.

Addition and subtraction in binary fields can be achieved by adding or subtracting two of these polynomials together, and reducing the result modulo 2. For instance, let a(x) = am−1xm−1 + · · · + a1x1 + a0, b(x) = bm−1xm−1 + · · · + b1x1 + b0 and

c(x) = a(x) + b(x) = cm−1xm−1+ · · · + c1x1+ c0. If ak, bk and ck are the coefficients

of a(x), b(x) and c(x) respectively, then:

ck = (ak+ bk) mod 2 (1.4)

The computational complexity of addition and subtraction in binary field is usually neglected.

Multiplication in a finite field is multiplication modulo an irreducible polynomial. Let a(x) and b(x) be the elements of GF (2m_{) and let modular multiplication c(x) also}

be an element of the field. c(x) might be accomplished in two steps, by performing first a polynomial product of the two operands a(x) and b(x), followed by a modular reduction step using the irreducible polynomial f (x). Then, we have:

c(x) = a(x) · b(x) mod f (x) (1.5)

A great deal of work has been done in studying aspects of inversion in a finite field since inversion is the most time-consuming of the four basic operations. The inverse of a polynomial a(x) in GF (2m_{) is defined as the computation process to find a}

polynomial a−1(x) in GF (2m_{), such that:}

a(x) · a−1(x) mod f (x) = 1 (1.6)

Inversion algorithms can be classified into two main categories, the Extended Eu-clidean Algorithm, and the Fermat’s Little Theorem based algorithm. These two algorithms will be discussed in chapters 2 and 3.

(14)

1.3 Related Work

Several algorithms for computing the Extended Euclidean based algorithms have been proposed in the literature [3-5]. In [4], a class of bit serial unidirectional systolic ar-chitectures for inversion and division in polynomial basis has been proposed. They also presented a variant of Extended Euclidean Algorithm (EEA) optimized for uni-directional systolization with no carry propagation structure. Also, in this design, a simpler distributed counter structure which is suitable for applications where the field dimension may be large or variable is introduced. Yan [5] presents two-dimensional systolic architectures for inversion based on a modified extended Euclidean algorithm. The new architecture uses a distributed control mechanism for a variety of field sizes and is suitable for Very Large Scale Integration (VLSI) implementation. In compari-son to similar architectures, their architectures have smaller critical path delays and use considerably fewer hardware costs. An optimized inversion algorithm that can be applied very well in hardware was proposed in [6]. A two-dimensional multipli-cation/inversion systolic architecture and a one-dimensional multiplimultipli-cation/inversion systolic architecture was implemented and can apply very well to an Elliptic Curve arithmetic unit required in elliptic curve cryptography.

In terms of the Itoh-Tsujii inverse algorithm in GF (2m_{), Rebeiro [7] proposed a}

mod-ification of the Itoh-Tsujii algorithm called quad-Itoh-Tsujii algorithm which was implemented on field-programmable gate-array platforms. That adapted algorithm requires shorter addition chains and reduces the clock cycles significantly by using a parallel architecture. A modified Itoh-Tsujii algorithm algorithm for inversion with polynomial basis was proposed in [8]. An optimal addition chain was used for inver-sion to reduce the operation time by the parallel computation between part of mul-tiplications and squarings. Their inversion architecture with a digit serial multiplier experimentally obtained 61% timing improvement and 69% less resources on average than previous designs with normal basis. Another parallel version of the Itoh-Tsujii algorithm was proposed in [9]. It used a special class of irreducible trinomials, namely, P (x) = xm_{+ x}k_{+ 1 to achieve its best performance. This special class of irreducible}

trinomials reduces the computation complexity and yields a 30% timing improvement on average compared to the standard version of it. In [10], a high-performance and high-speed FPGA implementation of polynomial basis ITA over GF (2m_{) generated}

(15)

5

designed by efficient digit-serial multiplier and k-times squarer blocks, where k is a small positive integer. Their design provides a comparable improvement compared with other implementations of the polynomial basis Itoh-Tsujii inversion algorithm.

1.4 Project Contributions

This project aims at finding an effective algorithm to perform inversion. Below are several contributions of this project:

1) A literature review of finite field arithmetic and the related work of inversion algorithm.

2) Introduce Extended Euclidean algorithm and Itoh-Tsujii algorithm.

3) Implement Extended Euclidean algorithm and Itoh-Tsujii algorithm on MATLAB. 4) Compare Extended Euclidean algorithm and Itoh-Tsujii algorithm.

1.5 Report Organization

This report is organized as follows. Chapter 2 introduces the Extended Euclidean based algorithm in a polynomial field GF (2m_{). The application of an inverter based}

on Fermat’s little theorem is presented in Chapter 3. The MATLAB implementation of these two algorithms is in Chapter 4. Chapter 5 is the conclusion of these two algorithms.

(16)

Chapter 2 Extended Euclidean algorithm

Euclidean algorithm is to calculate the greatest common divisor (GCD) of two inte-gers. It makes use of the fact that GCD(m, n) = GCD(m−n, n) and GCD(m, 0) = m and simply repeats the operation until n is zero. A more efficient way of doing this is to use

GCD(m, n) = GCD(n, m mod n), (2.1)

and repeat until n is zero. For example, to calculate the GCD (38,8), we write GCD (38, 8) = GCD (8, 6) = GCD (6, 2) = GCD (2, 0) = 2. Since modulo basically is repeated subtractions, this is very much the same algorithm, but several subtractions are done at once. In the case of prime fields, a great number of variants on Euclidean algorithm have been developed for use in cryptographic applications, as in [11]. The Extended Euclidean Algorithm may also be used to find the multiplicative inverse of polynomials over GF (2m_).

2.1 Extended Euclidean algorithm

Let f (x) be the irreducible polynomial over GF (2m). Also, let a(x) be the polynomial representation in this basis. Obviously, since f (x) is irreducible and since degree a(x) < degree f (x) holds, a(x) and f (x) are relatively prime. If we initiate the

(17)

7

Euclidean algorithm with f (x) and a(x), then the extended algorithm generates two polynomials, s(x) and t(x), with degrees degree s(x) < m and degree t(x) < m − 1. These polynomials satisfy:

a(x)s(x) + f (x)t(x) = GCD(a(x), f (x)). [12] (2.2) Since f (x) is an irreducible polynomial in GF (2m_{), we have GCD(a(x), f (x)) = 1.}

Hence, we find that a(x)s(x)+f (x)t(x) = 1. Over the finite field GF (2m_{), f (x)t(x) =}

0. Then a(x)c(x)+f (x)d(x) = 1 could be simplified to a(x)s(x) = 1. Then the inverse element a−1(x) has the polynomial representation s(x). Therefore, we can use the EEA for inversion in GF (2m) using a polynomial basis.

Algorithms 2.1 and 2.2 show the EEA algorithm.

Algorithm 2.1 Binary Polynomial Division (PloyDivide) [13] Input: Polynomial a(x) of m − 1 degree & f (x) of m degree. Output: r&q.

1: a ← a(x), f ← f (x), r ← 1, q ← 1

2: while (fdeg ≥ adeg) do

3: a ← a << (fdeg − adeg)

4: r ← a ⊕ f

5: if (rdeg ≥ adeg) then

6: q ← (q << (fdeg − rdeg)) + 1 7: else 8: q ← (q << (fdeg − adeg)) 9: end if 10: f ← r 11: end while

Algorithm 2.2 Extended Euclidean Algorithm (EEA) [13] Input: Polynomial a(x) of m − 1 degree & f (x) of m degree. Output: a(x)−1 mod f (x).

1: a ← a(x), f ← f (x), t ← 0, s ← 1.

2: while r 6= 0 (gcd 6= 1) do

3: Perform PolyDivide to find r & q (f = a × q + r)

4: f ← a, a ← r

5: t ← s, s ← t − q × s

(18)

Algorithm 2.1 is the binary division algorithm. The loop from line 2 to 11 computes the remainder and the quotient when f (x) is divided by a(x). In line 3, we firstly left shift a(x) by the amount of fdeg − adeg and use the new value of a(x) to do a(x) xor

f (x) to obtain r(x) in line 4. From line 5 to 9, we compare the degree of r(x) with the degree of a(x). If the degree of r(x) is greater than or equals to da, we left shift

q(x) by the amount of fdeg − rdeg and add 1 to the last bit of q(x). If the degree of

r(x) is smaller than da, we left shift q(x) by the amount of fdeg − adeg. And then we

assign the value of r(x) to f (x) in line 10. This process is repeated until the degree of f (x) is smaller than da.

For instance, let f (x)=1111010000, a(x)=11011, da is the degree of the initial value of

a(x) which is 4, df is the degree of f (x), dris the degree of r(x), df a=df-da, df r=df-dr

and q(x) is the quotient. From Table 2.1, the initial value of df a is 5, so we left shift

a(x) by 5 and the new value of a(x) is 1101100000. The value of r(x) is computed by a(x) xor f (x). Since dr is greater than da and df r is 2, we left shift q(x) by 2 and

add 1 to the last bit. After the third iteration, dr is 3 which is smaller than 4, the

quotient is 101100 with the remainder 100.

Table 2.1: An example of binary polynomial division Iteration a(x) r(x) df a df r q(x)

0 11011 1 5 1

1 1101100000 10110000 3 2 101

2 11011000 1101000 2 1 1011

3 1101100 100 101100

Algorithm 2.2 use the remainder and the quotient produced in Algorithm 2.1 and iteratively compute the s coefficient which is the inverse of a. The computational complexity of EEA is O(m2_{) [10].}

The proof of Algorithm 2.2 relies on the fact that for two nonzero polynomials a and f , the EEA produces the unique pair of polynomials (s, t) such that:

f t + as = GCD(f, a). (2.3)

If we replace f and a by a and (f mod a) and let ¯t be the previous value of t and ¯s be the previous value of s, we have:

(19)

9

a¯t + (f mod a)¯s = GCD(a, f mod a) (2.4) From Equation 2.1, it follows that:

GCD(f, a) = GCD(a, f mod a). (2.5)

By using Equation 2.5, it can be observed that the right part of Equation 2.3 and 2.4 are equal, therefore we can write that:

f t + as = a¯t + (f mod a)¯s (2.6) Based on the fact that the Euclidean division of f by a may be written f = aq + r and r = f mod a = f − aq, we rearrange Equation 2.6 as:

f t + as = a¯t + (f mod a)¯s = a¯t + (f − aq)¯s = f ¯s + a(¯t − q¯s) Hence,

t = ¯s

s = ¯t − q¯s (2.7)

In this recursive function, the new value of s which is the output of Algorithm 2.2 can be computed directly from its current values and its previous value by the formulas s = ¯t − q¯s. After iteratively computing the s coefficient by using the the quotient obtained in Algorithm 2.1, we can get the value of s which is the inverse of a.

2.2 An example of Extended Euclidean algorithm

Table 2.2 shows an example of how the EEA algorithm works, where m=25, f (x) = x25+ x3+ 1 = x”2000009”, a(x) = x24+ x8+ x7+ x6+ x = x”10000E2”. The variables

(20)

in this example are displayed in hexadecimal representation. All variables are initialized as follows: r = 1, q = 1, t = 0, s = 1. Each variable is computed as follow:

f (i) = a(i − 1) a(i) = r(i − 1)

r(i) and q(i) are calculated by using Algorithm 2.1. and the value of f (i) and a(i). t(i) = s(i − 1)

s(i) = t(i − 1) ⊕ (q(i) × s(i − 1)).

At the fifth iteration, r is 1 and the new value of r to the next iteration would be 0. Thus the value of s in this iteration is the invese of a(x) mod f (x).

As shown in Table 2.2, the multiplicative inverse of (x10000E2 mod x2000009) is (x054ED9E).

Table 2.2: An example of EEA

It# a f r q t s 0 10000E2 2000009 1 1 0 1 1 1CD 10000E2 1CD 2 1 2 2 D0 1CD D0 1B8CA 2 37195 3 6D D0 6D 2 37195 6E328 4 A 6D A 2 6E328 EB7C5 5 1 A 1 E EB7C5 54ED9E

(21)

11

Chapter 3 Itoh-Tsujii algorithm

3.1 Inversion based on Fermat’s little theorem

The simple and primary dividers based on Fermat’s little theorem are also known as multiplicative based dividers because in Fermat’s little theorem, the division is performed by a sequence of squarings and multiplication.

The Itoh-Tsujii algorithm based on Fermat’s little theorem was originally proposed to be applied in [14] using Normal Basis representation. Since its publication, however, several improvements and variations of it have been reported in [6-8] showing that it can also be used in other field representations such as the polynomial representation. To compute inverse using normal bases representation, basis conversion between poly-nomial and normal bases is needed at the beginning and end of the operation. The algorithm to convert polynomial bases to normal bases is complicated and takes a lot of computational work which influence the speed of compute inversion in normal bases [15].

Binary field multiplication using normal bases representation is more complicated and more costly in time and implementation area compared to multiplication using polynomial bases [16]. Therefore, normal bases are competitive only with very few multiplications [17]. The Itoh-Tsujii computes the multiplicative inverse using a series of multiplications and squarings. Although squaring in normal bases is computed by a cyclic shift of the binary representation [18], the higher computational complexity

(22)

of multiplication leads to efficiency decrease.

Therefore, this project implements the Itoh-Tsujii algorithm using a polynomial basis and compares the performance of Itoh-Tsujii algorithm with EEA in the same bases. Let p be a prime and let a be an integer satisfying GCD(a, p) = 1. Then:

ap−1 ≡ 1( mod p) (3.1)

or

a × ap−2≡ 1( mod p) (3.2)

Hence we can conclude the inversion of any integer a over GF (p) is ap−2_.

For example, in GF (5), the numbers chosen for a is 3. Then the inversion of 3 over GF (5) is 35−2 = 2 ( mod 5) and 2 × 3 = 1 ( mod 5).

Expanding this technique to GF (2m_{), we can write a}2m−1 _{= a × a}2m−2 _{= 1.}

Hence:

a−1 = a2m−2. (3.3)

3.2 Itoh-Tsujii algorithm

Itoh-Tsujii algorithm is based on Fermat’s little theorem, by which the inverse of an element a ∈ GF (2m) is computed by a−1 = a2m−2.

A straightforward implementation of Equation 3.3 requires m − 2 multiplications and m − 1 squarings. The Itoh-Tsujii algorithm reduces the number of multiplication to log2(m − 1) + HW (m − 1) − 1, where HW (m − 1) is the Hamming weight of the

binary representation of m − 1 and the number of required squaring is m − 1[10]. This remarkable saving on the number of multiplications is based on the observation that the inverse can be rewritten from [19] as:

(23)

13

a−1 = [Sm−1(a)]2 (3.4)

where Sk = a2

k₋₁

∈ GF (2m_{) and k ∈ N . Let k, j be two positive integers. Then, the}

element Sk+j ∈ GF (2m) can be expressed as:

Sk+j = Sk2

j

· Sj = Sj2

k

· Sk (3.5)

In Itoh-Tsujii algorithm, an addition chain is used to reduce the number of multi-plications required and perform this field exponentiation more efficiently. Addition chain for an integer value such as m − 1 is a series of positive integers with t elements such that, C={c1, c2, · · · , ct}. Algorithm 3.1 and the flowchart in Figure 3.1 show

how to compute the addition chain. Given f (x) of m degree, we have c1 = 1 and

ct= m − 1. If ci is even, ci−1 = ci/2 and if ci is odd, ci−1= ci− 1.

Hence, to compute a−1, we should use the Equation 3.3 and an addition chain con-structed using Algorithm 3.1 to achieve Sm−1(a) = a2

m₋₁

.

Itoh-Tsujii Algorithm is illustrated in Algorithm 3.2 and the flowchart of it is shown in Figure 3.2. Considering Equation 3.4, we can compute the inverse of a by calculating the square of Sm−1(a). Therefore, Algorithm 3.2 iteratively computes the Si

coeffi-cients in the order stipulated by the addition chain. In the final iteration, after having computed the coefficient St = a2

m−1₋₁

, the algorithm returns the required multiplica-tive inversion by performing a regular field squaring, namely, St2 = a2

m−2

= a−1. The inverse of a is St2.

It has been shown that the maximum number of multiplication in this method is t and the required number of square operation is m − 1, where t is the step-length of the addition chain for m − 1 [9].

The advantage of Fermat’s little theorem based inversion algorithm is that it can be implemented just by using multiplication and squaring. This eliminates the need to add any extra components, such as dividers.

(24)

Algorithm 3.1 Finding the addition chain

Input: Polynomial a(x) of m − 1 degree & f (x) of m degree. Output: An addition C of length t.

1: i ← 1, C(i) ← m − 1.

2: while C(i) >1 do

3: if C(i) mod 2 == 0 then

4: C(i + 1) ← C(i)/2 5: else 6: C(i + 1) ← C(i) − 1 7: end if 8: i ← i + 1 9: end while

Algorithm 3.2 Itoh-Tsujii Algorithm [19]

Input: Polynomial a(x) of m − 1 degree & f (x) of m degree. Output: a(x)−1 mod f (x).

1: S0 ← a(x).

2: for i from 1 to t do

3: Si = (Si1)2Ci2 × Si2

4: end for

5: a−1(x) ← (St)2

(25)

15

3.3 An example of Itoh-Tsujii algorithm

The inversion operation for GF (2233) has been illustrated with an example. To calculate a−1 in GF (2233_{) with m = 233, we use the addition chain}

C = {C(t), · · · , C(2), C(1)} with t elements.

From Algorithm 3.1 , we have C(1) = m − 1 = 232. Since C(1) = 232 is an even number, then, C(2) = C(1)/2 = 116. If C(i) is odd, C(i + 1) follows the rule that C(i + 1) = C(i) − 1.

Therefore, we obtain the addition chain with length t = 10 : C={1, 2, 3, 6, 7, 14, 28, 29, 58, 116, 232 }.

Table 3.1: Inverse of a ∈ GF (2233) using an addition chain [1] Step SV i(a) SVj+Uk(a) Exponentiation

1 S1(a) a 2 S2(a) S1+1(a) (S1)2 1 S1 = a2 2₋₁ 3 S3(a) S2+1(a) (S2)2 1 S1 = a2 3₋₁ 4 S6(a) S3+3(a) (S3)2 3 S3 = a2 6₋₁ 5 S7(a) S6+1(a) (S6)2 1 S1 = a2 7₋₁ 6 S14(a) S7+7(a) (S7)2 7 S7 = a2 14₋₁ 7 S28(a) S14+14(a) (S14)2 14 S14= a2 28₋₁ 8 S29(a) S28+1(a) (S28)2 1 S1 = a2 29₋₁ 9 S58(a) S29+29(a) (S29)2 29 S29= a2 58₋₁ 10 S116(a) S58+58(a) (S58)2 58 S58 = a2 116₋₁ 11 S232(a) S116+116(a) (S116)2 116 S116 = a2 232₋₁

The computational process is illustrated in Table 3.1, Vi are the integers in the

addi-tion chain and Vj = Vi−1, Vi = Vj+ Uk.

From Equations 3.4 and 3.5, we have: SVj+Uk =SVj

2j

· SUk , where SVi = a

2Vi−1

. Thus, we can rewrite SV i(a) as:

SVi(a) = SVj+Uk=SVi−1+Uk = SVj

2Vj_S Uk = a

2Vi−1

(26)

As shown in Figure 3.2, we obtain the value of S233 and the inverse of a is (S233)2.

It may be noted that the Itoh-Tsujii Algorithm for field GF (2m_{) requires a high}

number of squarings. The large number of squarings required leads to efficiency decrease.

(27)

17

Chapter 4 MATLAB Implementation

4.1 MATLAB results

Inverters based on EEA and Itoh-Tsujii Algorithms are implemented in MATLAB as shown in Appendix A. In MATLAB, a polynomial is represented as a vector. For instance, to calculate the inverse of a(x) = x4 _{+ x}2 _{+ 1 with an irreducible}

polynomial f (x) = x5 _{+ x}2 _{+ 1 in GF (2}5_{), we input a(x)=[(MSB) 1 0 1 0 1 (LSB)]}

and f (x)=[(MSB) 1 0 0 1 0 1 (LSB)]. After implementing the MATLAB code, the results of EEA and Itoh-Tsujii algorithms are both [(MSB) 1 1 0 1 0 (LSB)] which implies the inverse a−1(x) of a(x) is x4+ x3+ x.

To compare the performance of inverters based on EEA and Itoh-Tsujii Algorithms, the timeit function and the stopwatch timer function, namely, tic and toc functions are used to time how long the MATLAB code of EEA and Itoh-Tsujii algorithms take to run.

Table 4.1 and 4.2 presents the MATLAB implementation time of the EEA and Itoh-Tsujii algorithms in two different processor. The platform used in Table 4.1 is a Dell Optiplex 9020 computer with a 4th generation Intel Core-i7-4790 3.6 GHz quad-core processor, 16 GB of RAM. The platform used in Table 4.2 is a Sony SVF142171SCW computer with a 3nd generation Intel Core-i5-3337 1.8 GHz dual-core processor, 8 GB of RAM. The results show that performance of both algorithms on a quad-core processor is more satisfied than that on a dual-core processor.

(28)

With the key length increases the execution time increases within an acceptable range. The performance of the EEA based inverters implemented in MATLAB is considered promising from the result.

The execution time of Itoh-Tsujii algorithm increases as the key size m increases. For m smaller than 23, the Itoh-Tsujii algorithm performs efficiently. However, when the key size becomes greater than 23, the performance of Itoh-Tsujii algorithm is not as efficient as EEA.

Table 4.1: Execution time of EEA and Itoh-Tsujii Algorithms on a quad-core proces-sor

m 7 20 23 25 27 31

EEA/Time(ms) 2 11 14 15 16 18

Itoh-Tsujii/Time(ms) 3.2 28 70 160 356 1007

Table 4.2: Execution time of EEA and Itoh-Tsujii Algorithms on a dual-core processor

m 7 20 23 25 27 31

EEA/Time(ms) 3 25 29 33 34 35

Itoh-Tsujii/Time(ms) 12 49 118 375 845 2345

4.2 Analysis and comparison

For an efficient implementation of ECC, it is very important to carry out finite field operations faster and use lesser resources. The inversion operation consumes most of the time and resources. Therefore, the speed of inversion has a great impact on the computation time of ECC. EEA and Itoh-Tsujii algorithms have been effective in achieving fast inversion.

The Extended Euclidean Algorithm finds the inverse in binary fields using repeated bi-nary polynomial division operations. Since performing the division is time-consuming,

(29)

19

the EEA replaces the division with shifts and subtractions which can be implemented efficiently.

The Itoh-Tsujii algorithm performs inversion by a sequence of multiplication and squaring. In order to reduce the number of multiplications, an addition chain can be used to carry out the computation of the multiplicative inversion. With the addition chain, the Itoh-Tsujii algorithm computes the inverse in less time using a recursive re-arrangement of finite field operations.

In relation to speed, EEA based inverters yield an efficient way to compute inverse in the binary field since EEA based inverters mainly use shifts and subtractions. Itoh-Tsujii algorithm has a higher computational complexity than EEA since it requires many multiplication and squaring operations to compute inverse. Therefore, EEA based inverters take less computational work in polynomial bases. The results reveal that they both are very efficient for the key size smaller than 23. But when the key size becomes greater than 23, EEA based inverters are much faster.

(30)

Chapter 5 Conclusion

Finite field arithmetic is used in a variety of applications, including in coding theory and cryptography. Compared to other arithmetic operations in finite fields, the in-version is the most time-consuming operation. Efficient implementation of inin-version would therefore be a challenging problem. In general, the most common methods to compute inversion are based on Itoh-Tsujii algorithm and EEA.

The Euclidean Algorithm is a set of instructions for finding the greatest common divisor of any two positive numbers. EEA is an extension of Euclidean Algorithm that computes the greatest common divider and finds the multiplicative inverse using repeated division operations.

The Itoh-Tsujii Algorithm is based on Fermat’s little theorem. This algorithm per-forms the inversion by a series of multiplications and squarings. In order to reduce the number of multiplications, Itoh-Tsujii Algorithm uses addition chain to perform inverse more efficiently.

To perform inversion in finite field, some other schemes have been proposed such as Wiener-Hopf equation based inverters. Morii [20] proved that solving the discrete time Wiener-Hopf equation is equivalent to performing division over finite fields. The hardware efficiency of these inverters is not comparable with Itoh-Tsujii and Extended Euclidean based inverters.

This report provides a literature review of inverters base on EEA and Itoh-Tsujii Algorithm. These two common classes of inverters which are widely used for the

(31)

21

cryptographic purpose are illustrated with examples. The MALAB implementation of EEA and Itoh-Tsujii Algorithm has been presented in this report. The execution time in MATLAB shows that the EEA is more efficient than Itoh-Tsujii algorithm in polynomial bases.

For the future work, the optimization of inverters based on Itoh-Tsujii Algorithm with large key size might be a reasonable starting point. Finding the parallel version of the Itoh-Tsujii algorithm or using an optimal addition chain might be useful to speed up performance.

(32)

Appendix A

function [q,r]=func_divide(A,F)

% F is fivided by A

% obtain the quotient and the remainder

q=1; r=1; da=func_poly_degree(A); df=func_poly_degree(F); if da>df q=0; r=F; elseif da==0 q=F; r=0; else while (df>=da) if r==0 break end

(33)

23

dfa=df-da;

B=[A,zeros(1,dfa)]; % left shift A by amount of dfa r=xor(B,F);

r=deletezeros(r);

%Caculate the quotient

dfr=func_poly_degree(F)-func_poly_degree(r); if(func_poly_degree(r)>=da) q=[q,zeros(1,dfr)]; q(end)=1; else q=[q,zeros(1,dfa)]; end F=r df=func_poly_degree(F); end end end function inv_a=func_EEA(A,F)

% Extended Euclidean algorithm

% Algorithm finds the inverse of an element A in F_{2^m}. % F is the primitive polynomial

FF=F; AA=A; g1=0; g2=1; r=1; C=F;

(34)

while 1 [q,r]=func_divide(A,F) if r==0 break end g3=g1; g1=g2; B=func_poly_mult(g2,q); delta=func_poly_degree(B)-func_poly_degree(g3); g3=[zeros(1,delta),g3]; g2=xor(B,g3); F=A; A=r; end

inv_a=g2; % The inverse of A

% test if A mutiplied by the inverse of A equals 1 mul=func_poly_mult(inv_a,AA);

[q,r]=func_divide(FF,mul); % mod F

if r==1

fprintf(’\n inv_a*a=1, the answer is correct\n’) end end function inv_a=main_inv(A,F) %Itoh-Tsuji Algorithm m=length(F)-1; i=1;

(35)

25

b=cell(20);

%Generate addition chain c(i) c(i)=m-1; while c(i)>1 if (mod(c(i),2)==0) c(i+1)=c(i)/2; i=i+1; else c(i+1)=c(i)-1; i=i+1; end end b{c(i)}=A; while i>1 l= c(i-1)-c(i); p=func_square( b{c(i)} , l,F ); b{c(i-1)} = func_poly_mult(p,b{l},m,F); i=i-1; end %inverse of A is inv_a=func_poly_mult(b{c(i)},b{c(i)},m,F);

% test if A mutiplied by the inverse of A (inv_a) equals to 1 p=func_poly_mult(inv_a,A,m,F); % p=a*a^(-1)

flag = find(p~=0 );

p = p(flag:end) % remove leading zeros if(p==1)

fprintf(’\n inv_a*a=1, the answer is correct\n’) end

(36)

Bibliography

[1] A. A. Zadeh, “Division and inversion over finite fields,” in Cryptography and Security in Computing. InTech, 2012.

[2] D. Hankerson, A. J. Menezes, and S. Vanstone, Guide to elliptic curve cryptog-raphy. Springer Science & Business Media, 2006.

[3] I. H. Hazmi, F. Zhou, F. Gebali, and T. F. Al-Somani, “Review of elliptic curve processor architectures,” in Communications, Computers and Signal Processing (PACRIM), 2015 IEEE Pacific Rim Conference on. IEEE, 2015, pp. 192–200. [4] A. K. Daneshbeh and M. A. Hasan, “A class of unidirectional bit serial sys-tolic architectures for multiplicative inversion and division over GF(2m_{),” IEEE}

Transactions on Computers, vol. 54, no. 3, pp. 370–380, 2005.

[5] Z. Yan, D. V. Sarwate, and Z. Liu, “High-speed systolic architectures for finite field inversion,” Integration, the VLSI Journal, vol. 38, no. 3, pp. 383–398, 2005. [6] A. P. Fournaris and O. Koufopavlou, “Applying systolic multiplication–inversion architectures based on modified Extended Euclidean algorithm for GF(2k_{) in}

el-liptic curve cryptography,” Computers and Electrical Engineering, vol. 33, no. 5, pp. 333–348, 2007.

[7] C. Rebeiro, S. S. Roy, D. S. Reddy, and D. Mukhopadhyay, “Revisiting the Itoh-Tsujii inversion algorithm for FPGA platforms,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 19, no. 8, pp. 1508–1512, 2011. [8] L. Li and S. Li, “Fast inversion in GF(2m) with polynomial basis using optimal

addition chains,” in Circuits and Systems (ISCAS), 2017 IEEE International Symposium on. IEEE, 2017, pp. 1–4.

(37)

27

[9] F. Rodr´ıguez-Henr´ıquez, G. Morales-Luna, N. A. Saqib, and N. Cruz-Cort´es, “Parallel Itoh–Tsujii multiplicative inversion algorithm for a special class of tri-nomials,” Designs, Codes and Cryptography, vol. 45, no. 1, pp. 19–37, 2007. [10] B. Rashidi, R. R. Farashahi, and S. M. Sayedi, “High-performance and

high-speed implementation of polynomial basis itoh–tsujii inversion algorithm over gf (2 m),” IET Information Security, vol. 11, no. 2, pp. 66–77, 2017.

[11] J. Vliegen, N. Mentens, J. Genoe, A. Braeken, S. Kubera, A. Touhafi, and I. Ver-bauwhede, “A compact FPGA-based architecture for elliptic curve cryptography over prime fields,” in Application-specific Systems Architectures and Processors (ASAP), 2010 21st IEEE International Conference on. IEEE, 2010, pp. 313– 316.

[12] M. Olofsson, VLSI Aspects on Inversion in finite fields. Department of Electrical Engineering, Link¨opings university, 2002.

[13] I. H. Hazmi, “Project: EEA-based polynomial inversion over GF(2m_{): FPGA}

design and implementation,” ECE, University of Victoria, Tech. Rep., 2015. [14] T. Itoh and S. Tsujii, “A fast algorithm for computing multiplicative inverses in

GF(2m_{) using normal bases,” Information and computation, vol. 78, no. 3, pp.}

171–177, 1988.

[15] A. Ibrahim, F. Gebali, and T. F. Al-Somani, “Systolic array architectures for Sunar–Ko¸c optimal normal basis type ii multiplier,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 23, no. 10, pp. 2090–2102, 2015. [16] F. Gebali and T. Al-Somani, “Finite field multiplication using reordered normal

basis multiplier,” in Broadband and Wireless Computing, Communication and Applications (BWCCA), 2011 International Conference on. IEEE, 2011, pp. 320–326.

[17] D. J. Bernstein and T. Lange, “Type-II optimal polynomial bases,” in Interna-tional Workshop on the Arithmetic of Finite Fields. Springer, 2010, pp. 41–61. [18] B. Sunar and C. K. Ko¸c, “An efficient optimal normal basis type II multiplier,”

(38)

[19] F. Rodriguez-Henriquez, N. Cruz-Cortes, and N. Saqib, “A fast implementation of multiplicative inversion over GF(2m),” in Information Technology: Coding and Computing, 2005. ITCC 2005. International Conference on, vol. 1. IEEE, 2005, pp. 574–579.

[20] M. Morii, M. Kasahara, and D. L. Whiting, “Efficient bit-serial multiplication and the discrete-time Wiener-Hopf equation over finite fields,” IEEE Transac-tions on Information Theory, vol. 35, no. 6, pp. 1177–1183, 1989.

Study of Extended Euclidean and Itoh-Tsujii Algorithms in GF(2m) using polynomial bases

Contents

List of Tables

List of Figures

List of Acronyms

Chapter 1

Introduction

1.1

Background

1.2

Preliminaries: Binary Finite Field Arithmetic

1.3

Related Work

1.4

Project Contributions

1.5

Report Organization

Chapter 2

Extended Euclidean algorithm

2.1

Extended Euclidean algorithm

2.2

An example of Extended Euclidean algorithm

Chapter 3

Itoh-Tsujii algorithm

3.1

Inversion based on Fermat’s little theorem

3.2

Itoh-Tsujii algorithm

3.3

An example of Itoh-Tsujii algorithm

Chapter 4

MATLAB Implementation

4.1

MATLAB results

4.2

Analysis and comparison

Chapter 5

Conclusion

Appendix A

Bibliography