Fast prime field arithmetic using novel large integer representation

(1)

by

Bader Hammad Alhazmi

B.Sc., King Abdulaziz University, 1999 M.Sc., Concordia University, 2011

A Dissertation Submitted in Partial Fulfillment of the Requirements for the Degree of

DOCTOR OF PHILOSOPHY

in the Department of Electrical and Computer Engineering

(2)

Fast Prime Field Arithmetic Using Novel Large Integer

Representation

by

Bader Hammad Alhazmi

B.Sc., King Abdulaziz University, 1999 M.Sc., Concordia University, 2011

Supervisory Committee

Dr. Fayez Gebali, Supervisor

(Department of Electrical and Computer Engineering)

Dr. Atef Ibrahim, Co-Supervisor

Dr. Andrew Rowe, Outside Member (Department of Mechanical Engineering)

(3)

Supervisory Committee

Dr. Fayez Gebali, Supervisor

Dr. Atef Ibrahim, Co-Supervisor

Dr. Andrew Rowe, Outside Member (Department of Mechanical Engineering)

ABSTRACT

Large integers are used in several key areas such as RSA (Rivest-Shamir-Adleman) public-key cryptographic system and elliptic curve public-key cryptographic system. To achieve higher levels of security requires larger key size and this becomes a limiting factor in prime finite field GF(p) arithmetic using large integers because operations on large inte-gers suffer from the long carry propagation problem. Large integer representation has direct impact on the efficiency of the calculations and the hardware and software implementations. Attempts to use different representations such as residue number systems suffer from their own problems. In this dissertation, we propose a novel and efficient attribute-based large integer representation scheme capable of efficiently representing the large integers that are commonly used in cryptography such as the five NIST primes and the Pierpont primes used in supersingular isogeny Diffie-Hellman (SIDH) used in post-quantum cryptography. More-over, we propose algorithms for this new representation to perform arithmetic operations such as conversions from and to binary representation, two’s complement, left-shift, num-bers comparison, addition/subtraction, modular addition/subtraction, modular reduction, multiplication, and modular multiplication. Extensive numerical simulations and software implementations are done to verify the performance of the new number representation.

(4)

Results show that the attribute-based large integer arithmetic operations are done faster in our proposed representation when compared with binary and residue number representa-tions. This makes the proposed representation suitable for cryptographic applications on embedded systems and IoT devices with limited resources for better security level.

(5)

Co-Authorship

[1] Bader Alhazmi and Fayez Gebali, “Fast Large Integer Modular Addition in GF(p) Us-ing Novel Attribute-Based Representation,” IEEE Access, vol. 7, pp. 58704-58719, Dec. 2019. DOI: 10.1109/ACCESS.2019.2914641

[2] Bader Alhazmi and Fayez Gebali, “Attribute-Based Large Integer Representation for GF(p) Arithmetic,” Springer Journal of Cryptographic Engineering, submitted, Manuscript ID: JCEN-S-18-00079, Aug. 2018.

[3] Bader Alhazmi and Fayez Gebali, “Fast Large Integer Multiplication Using Novel

Attribute-Based Representation,” IEEE Transactions on Computers, submitted, Manuscript ID: TC-2019-02-0097, Feb. 2019.

[4] Bader Alhazmi, Fayez Gebali, and Atef Ibrahim, “Fast Large Integer Modular Mul-tiplication Using Novel Attribute-Based Representation,” to be submitted to IEEE Transactions on Computers.

(6)

List of Tables

Table 4.1 X−Cases for addition/subtraction function. . . 25 Table 4.2 Y−Cases for addition/subtraction function. . . 29 Table 4.3 Speedup of attribute addition relative to RCA binary addition. . . 37 Table 4.4 Speedup of attribute addition relative to binary-Kogge Stone addition. 38 Table 4.5 Speedup of attribute addition relative to RNS addition with

conver-sions delays. . . 39 Table 4.6 Speedup of attribute addition relative to RNS addition without

con-versions delays. . . 40 Table 5.1 Speedup of attribute-attribute multiplication relative to classical

mul-tiplication. . . 59 Table 5.2 Speedup of attribute-NP multiplication relative to classical

multipli-cation. . . 59 Table 5.3 Speedup of attribute-attribute multiplication relative to Karatsuba

multiplication. . . 60 Table 5.4 Speedup of attribute-NP multiplication relative to Karatsuba

multipli-cation. . . 60 Table 5.5 Speedup of attribute-attribute multiplication relative to RNS

multipli-cation with conversions delay. . . 61 Table 5.6 Speedup of attribute-NP multiplication relative to RNS multiplication

with conversions delay. . . 61 Table 5.7 Speedup of attribute-attribute multiplication relative to RNS

multipli-cation without conversions delay. . . 62 Table 5.8 Speedup of attribute-NP multiplication relative to RNS multiplication

without conversions delay. . . 63 Table 6.1 Speedup of attribute-attribute modular multiplication relative to

(10)

Table 6.2 Speedup of attribute-NP modular multiplication relative to interleaved modular multiplication based on KSA. . . 84

(11)

List of Figures

Figure 3.1 An integer with three strings of ones. . . 10

Figure 3.2 Conversion from binary to attribute representation for R-bit zone size. Case when R = 8. . . 16

Figure 4.1 Attribute addition example. . . 24

Figure 4.2 Attribute addition and subtraction block diagram. (a) Pre-processing step to select add or add operation. (b) Block diagram for the addition operation at iteration i. . . 28

Figure 4.3 Addition delays for m bits field size with 4 bits word size. . . 41

Figure 5.3 Multiplication delays for m bits field size with 4 bits word size. . . . 65

Figure 5.4 Multiplication delays for m bits field size with 8 bits word size. . . . 66

Figure 5.5 Multiplication delays for m bits field size with 16 bits word size. . . 66

Figure 6.3 Modular multiplication delays for m bits field size with 4 bits word size. . . 85

(12)

Figure 6.6 Modular multiplication delays for m bits field size with 4 bits word size. . . 87 Figure 6.7 Modular multiplication delays for m bits field size with 4 bits word

size. . . 87 Figure 6.8 Modular multiplication delays for m bits field size with 4 bits word

size. . . 88 Figure 7.1 RSA Public-key Cryptographic System using attribute representation.

(13)

List of Algorithms

Algorithm 3.1 Pseudo code for conversion from an R-bit binary number to its attributes representation . . . 15 Algorithm 3.2 Pseudo code for conversion from attribute representation to m-bit

binary number . . . 17 Algorithm 3.3 Pseudo code to find One’s complement (OnesComp) for a number

based on their attributes. . . 18 Algorithm 3.4 Pseudo code for Left-Shift Algorithm (LeftShift). . . 19 Algorithm 3.5 Pseudo code for comparing two positive numbers (Compare)

based on their attributes. . . 21 Algorithm 4.1 Pseudo code for attribute addition and subtraction (AddSub) for

signed large numbers based on their attributes. . . 26 Algorithm 4.2 Pseudo code to reduce a number modulo p (Reduce): N mod p. 30 Algorithm 4.3 Pseudo code for m-bit Binary Addition (BiAdd) for large integers. 31 Algorithm 4.4 Pseudo code for w-bit Binary Kogge-Stone Addition (BiAddKsa)

for large integers. . . 32 Algorithm 4.5 Pseudo code for RNS Addition (RNSAdd) for large integers. . . . 34 Algorithm 5.1 Pseudo code for Attribute-Attribute Multiplication Algorithm

(MultAtt). . . 48 Algorithm 5.2 Pseudo code for Attribute-NP Multiplication Algorithm (MultAttNP). 50 Algorithm 5.3 Pseudo code for w-bit Classical Multiplication (ClMult) for large

integers. . . 52 Algorithm 5.4 Pseudo code for RNS Multiplication (RNSMult) for large integers. 54 Algorithm 5.5 Pseudo code for Karatsuba Multiplication (KaMult) for large

integers. . . 56 Algorithm 6.1 Pseudo code to reduce a number modulo p (Reduce): N mod p. 70 Algorithm 6.2 Pseudo code for Attribute-Attribute Modular Multiplication

(14)

Algorithm 6.3 Pseudo code for Attribute-NP Modular Multiplication Algorithm (ModMultAttNP). . . 75 Algorithm 6.4 Pseudo code for w-bit Interleaved Modular Multiplication (IntModMult)

(15)

List of Functions

Functions Input(s) Output(s) Algorithm Page

bi_2_att N_bi, z N_att, L, F 3.1 15 att_2_bi Natt, L, m Nbi 3.2 17 OnesComp N, L, m N0 _3.3 ₁₈ LeftShift N, L, n N0 _3.4 ₁₉ Compare N1, N2, L1, L2, m G, E, L 3.5 21 AddSub N₁, N2, L1, L2, s, m N3, L3 4.1 26 Reduce N, p, LN, Lp, m N0 4.2 30 BiAdd N₁, N2, w, m N3 4.3 31 BiAddKsa N1, N2, Cin, w N3, Cout 4.4 32 RNSAdd N1, N2, m, w N3 4.5 34 MultAtt N₁, N2, L1, L2, m N3 5.1 48 MultAttNP N1, N2, L1, L2, m N3 5.2 50 ClMult N₁, N2, w, m N3 5.3 52 RNSMult N1, N2, m, w N3 5.4 54 KaMult N₁, N2, m, w N3 5.5 56 ModMultAtt p, N1, N2, L1, L2, m N3 6.2 72 ModMultAttNP p, N1, N2, L1, L2, m N3 6.3 75 IntModMult p, N1, N2, w, m N3 6.4 78

(16)

List of Abbreviations

ALU Arithmetic Logic Unit

ASIC Application Specific Integrated Circuit BKA Brent-Kung Adder

CLA Carry Lookahead Adder CRT Chinese Remainder Theorem CSA Carry Save Adder

CSK Carry Skip Adder CSL Carry Select Adder

ECC Elliptic-Curve Cryptography FFT Fast Fourier Transform

FHE Fully Homomorphic Encryption FPG Field Programmable Gate Array GCD Greatest Common Divisor HCA Huan-Carlsson Adder IoT Internet of Things KSA Kogge-Stone Adder LFA Ladner-Fischer Adder LSA Lest-Significant Attribute

(17)

LSB Lest-Significant Bit MRC Mixed-Radix Conversion MSA Most-Significant Attribute MSB Most-Significant Bit

NIS National Institute of Standards and Technology PPF Parallel Prefix Adder

RCA Ripple Carry Adder RNS Residue number system

RSA Rivest-Shamir-Adleman public key cryptography system SA Sklansky Adder

(18)

ACKNOWLEDGEMENTS

In the name of Allah, the most Gracious, the most Merciful. All praise be to Almighty Allah, the lord of all the worlds for blessing me with guidance, patience, and perseverance to complete my Ph.D. journey.

This work would not have been possible without the support of many people. I would like to take this opportunity to express my thanks to those who helped me conduct my research and write this dissertation.

First and foremost, I would like to express my sincere thanks to my supervisor and mentor, Dr. Fayez Gebali for his guidance, advice, encouragement, and support during my Ph.D. journey. It would not be possible to finish my research without his valuable help and constructive comments. I feel honored by being able to work with him and look forward to a continued research relationship in the future.

Also, I would like to thank my supervisory committee members, Dr. Atef Ibrahim and Dr. Andrew Rowe for taking part of their precious time reviewing my dissertation and providing me with their insightful comments and invaluable suggestions.

Furthermore, I would like to thank my brothers and sisters for their love, prayers, and endless support. Special thanks to my wife and children for their love, patience, and believing in me.

Finally, I would like to thank my sponsor, the Kingdom of Saudi Arabia represented by Umm Al-Qura University for granting me the graduate program scholarship.

(19)

DEDICATION

To my parent, in memoriam, my brothers and sisters,

my wife, and my children.

(20)

Chapter 1 Introduction

1.1 Problem Statement

The world is witnessing a development of a novel networking paradigm evolved from the advancement of the internet and its applications and services and several other technologies and communication techniques. The new paradigm is known as Internet of Things (IoT). The IoT can be envisioned as a very large-scale network infrastructure of billions of smart Things (e.g., sensors, actuators, smartphones). According to Cisco, about 500 billion devices are expected to be connected to the internet by 2030 [1]. These smart Things are heterogeneous, mobile, uniquely addressable, and resource-constrained (processing, memory, power) devices. This new paradigm will have an impact of great magnitude on several aspects of everyday life of potential users in different domains such as transportation, healthcare, logistics, environmental monitoring, urban planning, and many others. However, the realization of all its potential benefits and enabling the widespread deployment of the IoT is faced with many technological challenges [2, 3]. One of the most critical challenges is the security and privacy issues for both participating Things and gathered or consumed data.

In a typical IoT scenario, when Things are required to collaborate for data/information sharing, they should start by initiating secure channels between them to make sure of the confidentiality and integrity of exchanged data and messages. Also, they should be able to check whether a request comes from an authentic and authorized Thing or not. The traditional security mechanisms for providing confidentiality, integrity, authentication, and availability can be used to protect these devices from external attacks. However, the traditional security mechanisms are very costly in terms of computation complexity,

(21)

memory, and power requirement when used in the IoT devices.

Several security mechanisms are based on public key cryptography systems such as Rivest-Shamir-Adleman (RSA), Elliptic-Curve Cryptography (ECC), ElGamal, and Fully Homomorphic Encryption (FHE). Recently, these cryptographic systems proved vulnerable to side-channel and quantum computing attacks. Higher levels of security are accomplished by using large cryptographic keys sizes. However, larger keys sizes require more processing resources and incur larger delays that may affect the usability of the system.

Public key cryptography algorithms mainly depend on the modular arithmetic operations over very large primes, as suggested by NIST [4] and Pierpont primes for Supersingular Isogeny Diffie-Hellman (SIDH) [5]. Designing an efficient modular arithmetic for large numbers in IoT devices is affected mainly by the utilized representation for the large numbers and the intelligence of the algorithms that perform the arithmetic operation. In cryptography, for instance, performing modular multiplication over GF(p) is a two-step operation: multiplication operation followed by reduction operation, or one interleaved operation: keeping the intermediate multiplication results reduced with respect to the modulus [6, 7]. The time needed for multiplying two integers is proportional to their sizes which will have a great impact on the efficiency of the system in general for integers that are hundreds or thousands of bits long. Therefore, there is a great demand for developing a more efficient algorithms to speedup the computation with large integers and yet efficient enough to be utilized by a resource limited devices.

Large integer representation has direct impact on the efficiency of the calculations in hardware/software implementations. However, operations on large integers suffer from the long carry propagation delays. Residue number system (RNS) is a commonly used repre-sentation to solve the carry propagation problem. However, RNS has several limitations such as conversion from/to binary representation and difficulty in determining basic proper-ties such as number magnitude, sign, overflow of a number or even ability to compare two numbers. These limitations restrict the use of this system to implement the cryptographic primitives in IoT applications.

In this dissertation, we propose a new number representation to overcome the disadvan-tages of the RNS number representation. Also, this number representation has the advantage of reducing the computation time and consumed energy for short word sizes. This makes it more suitable for use in IoT applications characterized by limited computational resources.

(22)

1.2 Research Objectives

The general aim of this research is to deal with the long carry propagation when large integers are used as inputs to an arithmetic operation. The research objectives for this work are:

1. Investigate the techniques used to deal with the carry propagation problem for large integers and study how the techniques perform the arithmetic operations.

2. Develop a novel scheme to efficiently represent large integers for cryptographic ap-plications.

3. Develop algorithms that use the novel scheme that perform the necessary arithmetic operations.

4. Verify the performance of the developed algorithms using numerical simulations and software implementations.

1.3 Contributions

The contributions of this research are:

1. Propose a new non-positional attribute-based large integer representation.

2. Develop algorithms to perform basic arithmetic operations based on the new repre-sentation. The following basic arithmetic operations will be considered:

• Converting binary number to attribute representation, and vice versa. • Two’s complement of a number: −N

• Left-shift (multiply by 2): 2N

• Numbers comparison: N1?N2, where ‘?’ represents the comparison operator 3. Develop algorithms to perform addition/subtraction operation and modular

addi-tion/subtraction operation based on the proposed attribute representation.

4. Develop algorithms to perform multiplication operation based on the proposed at-tribute representation.

5. Develop algorithms to perform modular multiplication operation based on the pro-posed attribute representation.

(23)

1.4 Dissertation Organization

The dissertation is organized as follows.

Chapter 2 provides a background and a review of related works dealing with arithmetic operations using large integers.

Chapter 3 presents the proposed attribute-based large integer representation. Moreover, it presents the developed algorithms based on the new representation to perform the basic arithmetic operations such as conversions to/from attribute representation, two’s comple-ment of a number, left-shifting a number, and numbers comparison.

Chapter 4 presents details of large integer addition/subtraction operation based on at-tribute representation. Also, it presents a detailed performance evaluation of the proposed algorithms compared to common binary adder algorithms for large integers.

Chapter 5 presents details of large integer multiplication operation based on attribute representation. Also, it presents a detailed performance evaluation of the proposed algo-rithms compared to common binary multiplication algoalgo-rithms for large integers.

Chapter 6 presents details of large integer modular multiplication operation based on attribute representation. Also, it presents a detailed performance evaluation of the pro-posed algorithms compared to common binary modular multiplication algorithms for large integers.

Chapter 7 concludes the dissertation and shows possible areas for improvement and highlights the future work.

(24)

Chapter 2 Literature Review

The performance of arithmetic operations are affected mainly by two factors, number representation and algorithms used to carry out the operations. Any attempt to improve the performance will be either by using more efficient representation or more intelligent algorithms or both. In the literature of arithmetic operation there are significant amount of publications covering the aforementioned three points of performance improvement.

2.1 Representation of Integer Numbers

The choice of integer representation affects the performance of the basic arithmetic opera-tions, especially when dealing with large integers.

An m-bit integer number N can be represented as

N _{= −n}_m−12m−1₊m−2Õ i=0

ni2i (2.1)

where the most significant bit (m − 1) is a sign bit. The remaining bits indicate the magnitude of the number. A large integer with m > w where w is the machine word size will be represented as a sequence of dm/we words. The delay of arithmetic operations with such representation is extremely long due to the long carry propagation.

The Residue Number System (RNS) [8], a non-positional number representation, has been proposed to overcome the carry delay problem. RNS allows representing large integers as a set of smaller integers to achieve fast and parallel arithmetic operations for addition, subtraction, and multiplication since they are performed on shorter operands [9–11]. This property has attracted the attention of many researchers to utilize RNS in many applications

(25)

in digital signal processing systems [12,13], error detection and correction and fault-tolerant applications [14], embedded systems [15], and asymmetric cryptography systems [16–22].

However, RNS suffers from several serious drawbacks:

1. It is difficult and/or slow to convert data between the RNS and their binary equivalents [23–26].

2. The sign of the data is not easily determined [27–30].

3. It is not easy to compare two numbers in RNS domain to determine equality or inequality [31–35].

4. It is hard to detect an overflow that might happen as a result of an operation [36,37]. 5. It is necessary to perform the expensive conversion to binary representation after each

arithmetic operation to be able to extract the state of the arithmetic results.

6. Scaling operation, which is a division by a constant, is difficult to implement with RNS representation [38–41].

7. It is inefficient to perform the division operation with RNS representation due to the combination of iterated subtractions and comparisons operations [42–47].

2.2 Integer Arithmetic

2.2.1 Addition/Subtraction

Binary addition algorithms are used in most, if not all, low-power embedded processors as well as high-performance servers. This is due to the simplicity of adding binary numbers. However, the carry propagation problem plagues binary addition and is the main factor that determines the speed of operation of the processor ALU. The binary ripple carry adder (RCA) is the simplest and slowest type of binary adders. However, it is widely used and still serves as the basis for comparing the performance of other addition algorithms.

There are many types of fast binary adders that have been proposed in the literature such as carry skip adders (CSK) [48, 49], carry select adders (CSL) [50, 51], carry save adders (CSA) [52], carry lookahead adders (CLA) [53, 54], and parallel prefix adders (PPF) [55–60]. Many variations and combinations on these basic binary adders have also been proposed such as the hybrid carry lookahead/carry select adder [61], hybrid ripple

(26)

carry/hierarchical carry lookahead type 2 adder [62] and hybrid parallel prefix/carry select and skip adder [63].

Most of the proposed works in the area of PPF adders focus on improving the perfor-mance of the hardware implementation in terms of delay, area, and power for the basic algorithm for PPF adders mentioned earlier or combination between them and the basic binary adders. A comparative analysis of PPF adders are given in [64–67].

2.2.2 Multiplication

In binary number representation, the most papular algorithms for integers multiplication are classical school-book multiplication [68], Comba multiplication [69], Karatsuba multiplica-tion [70], Toom-Cook multiplicamultiplica-tion [71,72], Schönhage-Strassen multiplicamultiplica-tion [73], and Fürer multiplication [74]. Classical school-book and Comba multiplication algorithms are the simplest algorithms for multiplication and they do not use any optimization technique. They use shift and addition operations to multiply inputs. Comba differs from the classical algorithm in the way of dealing with carry propagation within partial products but with-out any improvement in the total time needed for the operation. Karatsuba multiplication algorithm was the first to incorporate an optimization technique to improve multiplication time. The algorithm uses divide and conquer approach to split large integers into two smaller integers. Finding the product for these smaller parts required computation of three multiplication operation with serval extra addition operations. Toom-Cook multiplication algorithm is a generalized version of Karatsuba algorithm. It divides large integers into three or more smaller parts and finds the product in a similar way as Karatsuba algorithm does. The performance of Toom-Cook algorithm depends on the number of splits used in the algorithm. Generally, its performance is better than Karatsuba algorithm but with the cost of complexity in implementation. Schönhage-Strassen and Fürer multiplication algorithms use fast fourier transform (FFT) multiplication technique [75] to speedup the multiplication operation. The proposed algorithms required conversions to and from FFT domain which is an extra overhead that will affect the overall multiplication time.

2.2.3 Modular Multiplication: Multiplication Followed by Reduction

In this approach multiplication is performed using any multiplication techniques discussed in Section 2.2.2.

For modular reduction, there are four papular algorithms namely, Classical, Lookup-table based [76], Barrett [77], and Montgomery [78] modular reduction. Classical modular

(27)

reduction requires division operation to find the reminder which is very expensive operation especially for large integers. Lookup-table based modular reduction requires the use of pre-computed tables to speedup reduction operation. However, the effectiveness of this method becomes limited when dealing with large integers due to the huge storage requirement. Barrett reduction algorithm replaces the expensive division operation by less expensive multiplication operation and one precomputation step for a given modulus. This modulus dependency makes the algorithm suitable for the case that many reductions are performed with a single modulus [7]. Montgomery modular algorithm is similar to Barrett reduction in replacing the expensive division operation with less expensive operations. The main idea of Montgomery reduction is to convert inputs to Montgomery domain, perform the modular multiplication within the new domain and then convert back to the original do-main. Due to the overhead introduced by conversions to/from Montgomery domain, using this reduction method will be more effective for operations where many multiplications are performed for given inputs such as modular exponentiation which is a core operation in RSA cryptosystem [7].

2.2.4 Modular Multiplication: Interleaved Multiplication and

Reduc-tion

The main advantage of the interleaved modular multiplication is keeping the intermediate results always reduced with respect to the modulus. However, this advantage faced with the challenge of carry propagation delay when perform addition over large operands. The standard interleaved modular multiplication was introduced first by Blakley in 1983 [79] where the author proposed to replace the expensive division operation with addition and comparison operations to iteratively reduce the intermediate partial products. After that, several improvement to this version of the algorithm has been proposed [80–91]. The improvement proposed in these research focuses either on using more efficient addition or comparison and shift operations or parallelize the whole operation.

Authors in [81] proposed an improvement to the addition operation. They proposed the use of carry save addition where the partial product will be the summation of the carry and the sum. The comparison in their algorithm achieved by subtracting the modulus from the partial product and reduction will be required if the result less than zero. Since the result in the form of a carry-sum pair, testing the sign of the result requires performing the addition of booth the sum and the carry which will introduce more delay. To eliminate this delay, authors introduced a technique to estimate the sign of the partial product represented by a carry-sum

(28)

pair. Authors in [83] and [84] proposed a version of interleaved modular multiplier that uses a carry save addition technique and a more efficient comparison by using pre-computed values stored in lookup tables. Authors in [85] proposed a version of interleaved modular multiplier that exploit the built-in fast carry chains in FPGA implementation to improve the performance of the operation. Authors in [89] and [90] proposed faster interleaved modular multipliers based on Montgomery and Barrett reduction techniques. Authors in [86] proposed a version of interleaved modular multiplier based on radix-4, radix-8 and Booth encoding techniques. Their algorithm reduces the total number of iterations compared with the classical interleaved modular multiplier. Authors in [87] proposed a parallelized version of interleaved modular multiplier. The algorithm computes all possible intermediate results in parallel and confirms the correct result using sign detection technique because numbers are represented as carry-sum pairs. Authors in [88] proposed to replace the comparison step that requires scanning all over the operands from most to least significant bit in the worst case with simpler operation. Their modification improves the operation delay and reduces the required number of addition operation. Authors in [91] proposed a version of interleaved modular multiplier that performs the multiplication by going through the multiplier from least to most significant bit. The algorithm requires two reductions one after shift operation and the other for the accumulated partial product. However, the proposed algorithm was for polynomial basis multiplication for binary finite fields GF(2m_). In this work, we propose an improved interleaved modular multiplication algorithm for large integers. The proposed algorithm is based on our attribute-based representation for large integers.

(29)

Chapter 3 Attribute-Based Integer Representation

In this chapter, we propose a new number representation to deal with the problem of the carry propagation and overcome the disadvantages of RNS number representation discussed in Chapter 2. We called the new representation as Attribute Representation. Algorithm are also developed for converting a binary number to attribute representation, and vic versa along with other basic operations that will be used by other arithmetic operations.

3.1 Attribute Representation

An m-bit two’s complement integer has the binary representation:

N _{= −n}_m−12m−1₊m−2Õ i=0

ni2i (3.1)

Figure 3.1 shows the m-bit representation where the red box indicates the sign bit, the blue boxes indicate the non-zero bits, and the white boxes indicate the zero bits. The least-significant attribute is shown as LSA and the most-significant attribute is shown as MSA. β₀ ₀ LSA MSA LSB MSB β₂ α₀ α₂ m-1 α1 β1

Figure 3.1: An integer with three strings of ones. MSB: most-significant bit, LSB:

(30)

Referring to Fig. 3.1 and Eq. (3.1), we can represent N in terms of the non-zero values of ni as: N = α₂ Õ i=β2 2i₊ α₁ Õ i=β1 2i₊ α₀ Õ i=β0 2i _(3.2)

In general when we have L contiguous strings of 1’s, the above equation becomes:

N = α_L−1 Õ i=βL−1 2i₊ αÕL−2 i=βL−2 2i_{+ · · · +}Õα0 i=β0 2i _(3.3)

In that case, L represents the total number of attributes of the integer N.

Our proposed integer representation is based on the above equation. We use a short-hand notation to represent and to store in memory the number in terms of the summation limits indicated in Eq. (3.3):

N ≡ {(αL−1, βL−1), (αL−2, βL−2), · · · , (α0, β0)} (3.4) We call the tuple or pair (αi, βi) the i-th attribute of the number.

Equation (3.4) indicates that the number N can be represented by the set of (α,β) attributes. This list N can be stored as an abstract data type single- or doubly-linked list [92].

In Eq. (3.4), the attribute (α0, β0) is called the least-significant attribute (LSA), as shown in Fig. 3.1. Similarly, the attribute (αL−1, βL−1) is called the most-significant attribute (MSA), as shown in Fig. 3.1.

Assuming our integers are represented using m-bit, each α or β is an integer value that would require a bits where:

a= log₂m (3.5)

The following lemma proves the relationship between the values of α and β of attributes of a number.

Lemma 1. For a given number, the values of α and β must satisfy the following inequalities

α_i _{≥ β}_i (3.6)

β_i+1 > α_i+ 1, 0 ≤ i < L (3.7)

Proof: From (3.3) the least value for the upper limit of each summation is when αi = βi. Hence we have in general αi ≥ βi. This proves (3.6).

(31)

Since, there is at least one bit gap to separate any contiguous string of ones, the value of βi+1 can not equal the value of αi. This proves (3.7).

The following lemma gives an upper limit to the maximum number of attributes of an integer.

Lemma 2. The maximum number of attributes of an integer is m/2. Proof: Assume all the attributes have the same length la:

la = α − β + 1

Assume also that the number of zeros between attributes (l0) is equal. The average number of attributes will be given by:

Lavg= _l m a+ l0

The maximum number of attributes is when laand l0are at their least possible values. From Lemma 1, the least value for la = 1 and the least value of l0 = 1. Hence maximum number of attributes is given by:

L_max= m₂

The following lemma shows how the sign of an integer number can be inferred from its attributes representation.

Lemma 3. Given an m-bit integer with L attributes, the sign of that integer can be inferred

from the value ofα_L−1.

Proof: From (3.3), αL−1represents the position of the most significant 1 in the number N. When α_L−1 = m − 1 we have a negative number since the sign bit at location m − 1 is 1, according to (3.1). Conversely, when αL−1 < m − 1 we have a positive number since the sign bit at location m − 1 is zero.

The following lemma proves how to infer if a number is even or odd based on its attributes.

Lemma 4. Given an m-bit integer with L attributes, the number is even or odd from value

of β₀.

Proof: From (3.3), β0 represents the position of the least significant 1 in the number

N. When β0 = 0 we have an odd number since the bit at location 0 is 1, according to (3.1). Conversely, when β0 > 0 we have an even number since the bit at location 0 is zero.

(32)

3.2 Representation of the Number 0

We need to consider how to represent the integer 0 when our number is stored as a linked list in (3.4). For the case of a linked list representation, we assign the start address the value NULL.

3.3 Attribute Representation of NIST Primes

We illustrate in this section how the NIST primes are expressed using the proposed attribute-based representation. NIST proposed five primes for elliptic curve cryptography [4]:

P-192 = 2192_{− 2}64_{− 1} _(3.8)

P-224 = 2224_{− 2}96_{+ 1} _(3.9)

P-256 = 2256− 2224+ 2192+ 296− 1 (3.10) P-384 = 2384_{− 2}128_{− 2}96_{+ 2}32_{− 1} _(3.11)

P-521 = 2521_{− 1} _(3.12)

The binary representations of these five primes are given by:

P-192 =Õ191 i=65 2i₊Õ63 i=0 2i _(3.13) P-224 =Õ223 i=96 2i_{+ 2}0 _(3.14) P-256 = Õ255 i=224 2i₊ Õ192 i=192 2i₊Õ95 i=0 2i _(3.15) P-384 = Õ383 i=129 2i₊Õ127 i=96 2i₊Õ31 i=0 2i _(3.16) P-521 =Õ520 i=0 2i _(3.17)

(33)

The attribute-based representations of five NIST primes are given by: P-192 ≡ { (191,65), (63,0) } (3.18) P-224 ≡ { (223,96), (0,0) } (3.19) P-256 ≡ { (255,224), (192,192), (95,0) } (3.20) P-384 ≡ { (383,129), (127,96), (31,0) } (3.21) P-521 ≡ { (520,0) } (3.22)

From the above equations, it becomes obvious that attribute-based NIST primes represen-tation is very concise and requires a small number of entries compared to storing all m bits of the prime.

3.4 Conversion from Binary to Attribute Representation

Unlike other number representations, converting from binary to attribute representation is a simple process. Converting a binary number to attribute representation requires simple scanning, starting from the LSB to the MSB, or vice versa, for contiguous strings of ones. The length of each string could vary between 1 to m. For each string the position of the starting 1 is assigned to the β value of that string. The end position of the last 1 is assigned to the α value of that string.

However, the scanning process requires long time to complete especially for large numbers. One option divides the m bits into multiple zones where each zone is of length R bits. Conversion from binary to attribute proceeds in parallel in each zone.

The binary to attribute conversion in each zone can be performed in a binary tree with O(log2R) complexity. Algorithm 3.1 shows the pseudo code for conversion from an R-bit binary number to its attribute representation. The algorithm requires k iterations where k = log₂R. In the first iteration where k = 0 (Lines 3 – 22), each pair of binary inputs are converted to a single attribute and the flag F will be set to one indicating that there is an attribute generated from the conversion. Otherwise, the value of the flag F will be set to zero. The maximum length L is one in this level of conversion. Next iterations where k > 0 (Lines 24 – 50), the algorithm will iterate to combine any two attributes that are separated with one value (α in next attribute is equal β of first attribute + 1) and generate the required flag, set the length, and update M for next iterations. Figure 3.2 shows an example for conversion from binary to attribute representation for R-bit zone size when R = 8.

(34)

Algorithm 3.1 Pseudo code for conversion from an R-bit binary number to its attributes

representation (bi_2_att).

Input: Nbi(z) = [b(0), b(1), · · · , b(R − 1)] where

R_{is the number of bits in zone z and 1 ≤ z ≤ dm/Re}

Output: Natt, L, F

1: _{k ← 0} _{First Level}

2: { Start of parallel code section

3: for i = 0 : 2 : R− 2 do

4: _{if b}_{(i) = 1 then} _{b(i) is i-th bit}

5: _{β(k,i) ← i} 6: _{F(k,i) ← 1; L(k,i) ← 1} 7: _{if b}_{(i + 1) = 1 then} 8: _{α(k,i) ← i + 1} 9: else 10: _{α(k,i) ← i} 11: end if 12: else 13: _{if b}_{(i + 1) = 1 then} 14: _{β(k,i) ← i + 1; α(k,i) ← i + 1} 15: _{F(k,i) ← 1; L(k,i) ← 1} 16: _else 17: _{β(k,i) ← {}; α(k,i) ← {}} 18: _{F(k,i) ← 0; L(k,i) ← 0} 19: _{end if}

20: _{M(k,i) ← {(α(k,i), β(k,i))}} 21: end if

22: _{end for}

23: } End of parallel code section

24: for k = 1 to lg R− 1 do Next Levels

25: { _{Start of parallel code section}

26: for j = 0 : 2k+1 : R − 1 do

27: if F(k − 1, j) = 0 and F(k − 1, j + 2) = 0 then

28: _{M(k + 1, j) ← {}}

29: _{F(k + 1,i) ← 0; L(k + 1,i) ← 0}

30: else if F(k − 1, j) = 0 and F(k − 1, j + 2) = 1 then

31: _{M(k + 1, j) ← M(k, j + 2)} 32: _{F(k + 1, j) ← 1; L(k + 1, j) ← 1}

33: else if F(k − 1, j) = 1 and F(k − 1, j + 2) = 0 then

34: _{M(k + 1, j) ← M(k, j)} 35: _{F(k + 1, j) ← 1; L(k + 1, j) ← 1} 36: else 37: _{X ← end(M(k, j))} 38: _{Y ← first(M(k, j + 2))} Continued III

(35)

Algorithm 3.1 – (Continued) 39: if Y . β = X .α + 1 then 40: Mtemp ← {(X.β, Y.α)} 41: delete(X, M(k, j)) 42: delete(Y, M(k, j + 2)) 43: _{M(k + 1, j) ← M(k, j + 2) k M}_temp _{k M(k, j)} 44: _{F(k + 1, j) ← 1; L(k + 1, j) ← L(k, j) + L(k, j + 2) − 1} 45: else 46: _{M(k + 1, j) ← M(k, j + 2) k M(k, j)} 47: _{F(k + 1, j) ← 1; L(k + 1, j) ← L(k, j) + L(k, j + 2)} 48: end if 49: end if 50: end for

52: end for 53: N_att _{← M} 54: return Natt, L, F b₀ b₁ 0 M₀₀ F₀₀ b₂ b₃ 2 M₀₂ F₀₂ b₄ b₅ 4 M₀₄ F₀₄ b₆ b₇ 6 M₀₆ F₀₆ k = 0 k = 1 k = 2 M₁₀ M₁₄ F₁₀ F₁₄ F₂₀ M₂₀ C₀₀ C₀₂ C₀₄ C₀₆ C₁₀ C₁₄ C₂₀

Figure 3.2: Conversion from binary to attribute representation for R-bit zone size. Case when

(36)

3.5 Conversion from Attribute Representation to Binary

Unlike other number representations, converting from attributed representation to binary is a simple process. Converting attributes of a number to its binary equivalent requires building a m-bit string of zeros. For each (α, β) pair, a string of ones is inserted in the binary number starting at position β and ending at position α. This process can be done in parallel since there is no overlap between attributes positions. Algorithm 3.2 shows the pseudo code for conversion from attribute representation to m-bit binary number.

Algorithm 3.2 Pseudo code for conversion from attribute representation to m-bit binary

number (att_2_bi).

Input: Natt, L, m, R, where R is the number of bits in each zone Output: Nbi

1: _{M(0 : m − 1) ← 0}

2: { Start of parallel code section

3: _{for i = 0 to m}_{/R − 1 do} 4: for j = 0 to L(i) − 1 do

5: _{M(β(j) : α(j)) ← 1} 6: _{end for}

7: end for

9: N_bi _{← M}

10: return Nbi

3.6 Basic Operations using Attribute Representation

In this section, we present the some of the basic operations that will be used by other operations. The basic operations are: Two’s complement of a number, left-shift, and comparison between two number based on their attributes.

3.6.1 Attribute Two’s Complement Algorithm

In binary representation, two’s complement is used to accommodate negative numbers. The most-significant bit is reserved as a sign bit. A positive number has a zero in the sign bit, whereas a negative number has a one in the sign bit. There are two different ways to find the two’s complement of a number. The first method first finds the one’s complement then adding one. The second method scans for the first one from the least-significant bit then complements all the succeeding bits. In this desertation, we will use the first method.

(37)

Algorithm 3.3 shows how to find one’s complement for a number based on their at-tributes. The algorithm consists of three parts. The first part (Lines 1 – 4) to insert the complement attribute before the first attribute in the number N. The second part (Lines 5 – 8) to find the complement for attributes in the number N. The third part (Lines 9 – 15) to find the complement attribute after the last attribute in the number N. Finally, the algorithm will return the one’s complement N0_{for the input number N (Line 16.)}

After finding the one’s complement for the number, the next step will be adding one to the result of the first step N0_{. This step will be done by using the algorithm for addition} using attributes that will be discussed in Chapter 4.

Algorithm 3.3 Pseudo code to find One’s complement (OnesComp) for a number based on

their attributes. Input: N, L, m Output: N0 1: if βN(1) > 0 then 2: βN0(1) ← 0 3: αN0(1) ← β_N(1) − 1 4: end if 5: _{for i = 2 to L} _{− 1 do} 6: βN0(i) ← α_N(i − 1) + 1 7: α_N0(i) ← β_N(i) − 1 8: _{end for} 9: if αN(L) < m − 1 then 10: β_N0(L) ← α_N(L) + 1 11: αN0(L) ← m − 1 12: else 13: β_N0(L) ← α_N(L − 1) + 1 14: α_N0(L) ← β_N(L) − 1 15: end if 16: return N0

3.6.2 Attribute Left-Shift Algorithm

It should be pointed out that arithmetic and logical left shift operations are identical. However, in binary representation, before applying the arithmetic left shift operator a check must be made that the amount of shift does not lead to overflow where sign extension and extra bits must be added on the left. We are not concerned with this pre-check operation here and assume in this section that it has been done. Since attribute-based representation

(38)

is nonpositional, overflow condition must be dealt with only when converting the number to binary representation. No pre-check is necessary for our new number system.

Shifting a number by n bits is equivalent to multiplying the number by 2n_{. Left-Shift} operation in attribute representation is an attribute-wise operation that adds the value n to all attributes. The values of α and β for each attribute in the number will increase by n.

The pseudo code for Left-Shift operation is shown in Algorithm 3.4 where N is the number to be shifted, L is the number of attributes in the number, n is the required amount of shift, and N0_{is the left shifted number.}

Algorithm 3.4 Pseudo code for Left-Shift Algorithm (LeftShift). Input: N, L, n Output: N0 1: for i = 1 to L do 2: β0_{(i) ← β(i) + n} 3: α0_{(i) ← α(i) + n} 4: end for 5: return N0

3.6.3 Attribute Comparison Algorithm

In general, comparing two numbers requires determination of their sign and magnitude. Unlike RNS, attribute-based representation allows us to determine the sign and magnitude of a number without conversion to the binary representation. The sign of the number can be determined according to Lemma 3. To compare the magnitudes of the two numbers, we need to compare the MSA of both numbers.

Equality E of the two numbers N1and N2is determined by the equation:

E =                    1 when                L₁= L2 αN1(i) = αN2(i), ∀ 1 ≤ i ≤ L1 β_N₁_{(i) = β}_N₂_{(i), ∀ 1 ≤ i ≤ L}₁ 0 otherwise (3.23)

When N1and N2have opposite signs we have:

N1> N2 when αL1 < m − 1 and αL2 = m − 1 (3.24)

(39)

When both N1 and N2 are positive, Algorithm 3.5 is used to determine which number is greater than the other.

When both N1and N2are negative, Algorithm 3.5 can still be used to determine which number is greater than the other provided that the L and G outputs are exchanged.

(40)

Algorithm 3.5 Pseudo code for comparing two positive numbers (Compare) based on their attributes. Input: N1, N2, L1, L2, m Output: G, E, L 1: _{G ← 0; E ← 0; L ← 0} Initialization step 2: _{if L}₁_{, L}₂ _{or E = 0 then} 3: _{i ← L}₁; j ← L₂ MSA first 4: while i > 0 and j > 0 do 5: _{if α}_N₁_{(i) > α}_N₂_{(j) then} 6: return G ← 1; E ← 0; L ← 0 7: end if 8: _{if α}_N₁_{(i) < α}_N₂_{(j) then} 9: return G ← 0; E ← 0; L ← 1 10: end if 11: _{if α}_N₁_{(i) = α}_N₂_{(j) then} 12: if βN1(i) > βN2(j) then 13: return G← 0; E ← 0; L ← 1 14: _{end if} 15: if βN1(i) < βN2(j) then 16: return G← 1; E ← 0; L ← 0 17: _{end if} 18: if βN1(i) = βN2(j) then 19: _{i ← i − 1, j ← j − 1} 20: _{if α}_N₁_{(i) > α}_N₂_{(j) then} 21: return G← 1; E ← 0; L ← 0 22: end if 23: _{if α}_N₁_{(i) < α}_N₂_{(j) then} 24: return G← 0; E ← 0; L ← 1 25: end if 26: _{if α}_N₁_{(i) = α}_N₂_{(j) then} 27: if βN1(i) > βN2(j) then 28: return G← 0; E ← 0; L ← 1 29: _{end if} 30: if βN1(i) < βN2(j) then 31: return G← 1; E ← 0; L ← 0 32: _{end if} 33: end if 34: end if 35: _{end if} 36: end while 37: end if

(41)

Chapter 4 Attribute-Based Large Integers Addition

Having proposed and defined the attributes for an integer and the basic operations with attributes in Chapter 3, we are able now to propose algorithms for performing finite-field arithmetic operations using the attributes. In this chapter, we discuss the addition and modular addition operations for large integers.

4.1 Attribute Addition/Subtraction Algorithm

The attribute-based addition/subtraction algorithm relies on comparing the locations of the α-β attribute pairs of both numbers and the current input carry Cin(i). The comparison generates two vectors, X-vector and Y-vector. X-vector for comparing the location of current attributes and Y-vector for comparing the location of the current attributes with the input carry Cin. The addition result will be generated based on these two vectors. Table 4.1 shows the possible values for vector X and Table 4.2 shows the possible values for vector Y. The addition/subtraction algorithm proceeds by processing the LSA of both numbers first. The operation will continue processing the attributes until the end of the attributes in one of the inputs. Then, one extra operation is required to deal with the last output carry Cout(i). If the last output carry Cout(i) = NULL, the remaining attributes in the none empty number will be appended as MSA to the final result and hence the final Cout will be NULL. On the other hand, if the last output carry Cout(i) , NULL, it will be considered as an input carry to the next attribute. The result of this operation will be appended along with the current Cout(if not equal NULL) as MSA with the remaining attributes in the none empty number. It should be mentioned that no more operations are required in the case of the current Cout , NULL since there should be at least one bit gap between any contiguous

(42)

attributes in a number. The final Coutfor the operation will be NULL.

Figure 4.1 shows an example for adding two numbers N1= 2,066,400 and N2 = 262,016 by using the attribute addition algorithm. The attributes representation for N1 and N2 are as follow

N₁_{≡ {(20, 15), (10, 5)}} N2≡ {(17, 7)}

adding these two number requires two iterations. In the first iteration when i = 1, the attributes n1(i) = (10, 5) and n2(i) = (17, 7) are added. The initial input carry in this case is Cin = NULL. As mentioned earlier, the attribute addition starts by comparing attributes positions and generates two vectors X-vector and Y-vector. In this iteration, since the input carry Cin = NULL, only X-vector will be generated. According to Table 4.1, the X-vector value is X = [3313]. The addition result, using Algorithm 4.1, is

N0

3(1) ≡ {(10, 8), (6, 5)} C0

out(1) ≡ {(18, 18)} N0

3(1) is part of the final result of the addition and Cout0 (1) will be used as input carry for the next iteration, i.e., its value will be assigned to C00

in(2).

In the second iteration when i = 2, there is no more attributes in N2, so that only Y-vector will be generated. According to Table 4.2, the Y-vector value is Y = [11113311]. The addition result, using Algorithm 4.1, is

N00

3(2) ≡ {(17, 15)} C00

out(2) ≡ {(21, 21)} N00

3(2) is part of the final result of the addition and its value will be appended as MSA to N0

3(1). Since there are no more attributes in both numbers, Cout00 (2) will be appended too to the final result as MSA. The final result for the addition N3is as follows

N₃ _{≡ {(21, 21), (17, 15), (10, 8), (6, 5)}} = 2,328,416

(43)

(10,5) (20,15) (17,7) N₁ N₂ (10,5) (17,7) N₁ N₂ C_in C_out N₃ (6,5) i = 1 (10,8) (18,18) N₁ N₂ C _in N ₃ i = 2 (20,15) C _out (17,15) N₃ (6,5) (10,8) (17,15) (21,21) (18,18) (21,21)

Figure 4.1: Attribute addition example.

shown in Algorithm 4.1. It is important to point out here that the variables Cin and Coutare not simple bits but represent attributes with equal values of α and β for each of them.

Lines 2 – 8 setup the input carry Cinto the algorithm based on the desired operation add (s = 0, Cin = NULL) or subtract (s = 1, Cin = (0,0)). When s = 1, the algorithm finds the one’s complement using Algorithm 3.3 and replaces N2with the two’s complement of N2.

Lines 9 – 12 deal with the case when either of the numbers N1and N2are equal to zero. The none zero number will be assigned to N3and the output carry Coutwill be set to NULL. Lines 14 – 19 deal with the case when both numbers are greater than zero. In this case, the algorithm will iterate through attributes in both numbers, two attributes each time, and add them after generating X-vector and Y-vector by using Generate_X_Cases and Generate_Y_Cases functions respectively. The result of addition N3(i) is partial result

(44)

Table 4.1: X−Cases for addition/subtraction function. Cases x1 x2 x3 x4 α₁_{(i) > α}₂_{(i) 1} α₁_{(i) = α}₂_(i) 2 α₁_{(i) < α}₂_{(i) 3} β₁_{(i) > β}₂_(i) 1 β₁_{(i) = β}₂_(i) 2 β₁_{(i) < β}₂_(i) 3 α₁_{(i) > β}₂_(i) 1 α₁_{(i) = β}₂_(i) 2 α₁_{(i) < β}₂_(i) 3 β1(i) > α2(i) 1 β₁_{(i) = α}₂_(i) 2 β₁_{(i) < α}₂_(i) 3

and it may contains more than one attribute. The attributes in N3(i) will be appended to the final addition result. The output carry Cout(i) will be considered as input carry for the next iteration and its value will be assigned to Cin(i + 1). The algorithm will iterate until it reaches the end of the attributes list in one of the numbers.

If both numbers have the same number of attributes (Lines 20 – 23), the addition operation ends upon reaching the last attribute. If the output carry Cout(i) , NULL, its value will be appended to N3as MSA.

When the final attribute of one of the numbers is reached (Lines 24 – 52), the last step depends on the state of Cout(i). When Cout(i) = NULL, the remaining attributes in one of the inputs will be appended to N3. When Cout(i) , NULL, one extra addition operation is required.

Figure 4.2 shows an overview of the attribute addition/subtraction operation. Figure 4.2.a shows the pre-processing step to select add or subtract operation based on the control signal s. This figure corresponds to lines 2 – 8 in Algorithm 4.1. Figure 4.2.b is a block diagram for the addition operation at iteration i. The addition operation is shown by the ADD block or the ADD( ) function in Algorithm 4.1. The add operation depends on the values of the X and Y–Cases summarized in Table 4.1 and Table 4.2, respectively. Figure 4.1 provided a concrete example of the operation of the algorithm.

(45)

Algorithm 4.1 Pseudo code for attribute addition and subtraction (AddSub) for signed

large numbers based on their attributes.

Input: N1, N2, L1, L2, s, m Output: N3, L3

1: _{temp ← 0; L}3← 0; Initialization step

2: _{if s = 0 then} 3: C_in_{(i − 1) ← NULL} 4: else {s = 1} 5: C_in_{(i − 1) ← (0, 0)} 6: N0 2← OnesComp(N2, m); 7: N2← N₂0 8: _{end if} 9: if L2= 0 then 10: N3← N1; Cout← NULL; L3 ← L1 11: _{else if L}₁= 0 then 12: N₃_{← N}₂; Cout← NULL; L3 ← L2 13: else {L1 , 0 and L2, 0} 14: _{for i = 1 to min}_(L₁, L2) do

15: _{X[1 : 4] ← Generate_X_Cases(N}₁_{(i), N}₂_(i)) 16: _{Y[1 : 8] ←}

Generate_Y_Cases(Cin(i − 1), N1(i), N2(i))

17: _(C_out_{(i), N}₃_{(i),temp) ←}

Add(Cin(i − 1), N1(i), N2(i), X,Y)

18: L₃_{← L}₃+ temp; Cin(i + 1) ← Cout(i)

19: end for

20: if Cout(i) , NULL and L1= L2then

21: N₃_{← C}_out_{(i) k N}₃

22: L₃_{← L}₃+ 1

23: end if

24: _{if C}_out_{(i) , NULL and L}₁_{, L}₂_then 25: C_in _{← NULL; j ← min(L}₁, L2) + 1

26: if L1> L2then

27: _{X[1 : 4] ← Generate_X_Cases(N}₁_(j),C_out_(i)) 28: _{Y[1 : 8] ←}

Generate_Y_Cases(Cin, N1(j),Cout(i))

29: _(C_out_{(j), N}₃_{(j),temp) ←}

Add(Cin, N1(j),Cout(i), X,Y)

30: L3← L3+ temp;

31: _{if C}_out_{(j) , NULL then}

32: N₃ _{← N}₁_{(j + 1 : end) k C}_out_{(j) k N}₃

33: L_{3 ← L}3+ L1− j

(46)

Algorithm 4.1 – (Continued) 34: else 35: N3 ← N1(j + 1 : end) k N3 36: L3 ← L3+ L1− j − 1 37: end if 38: else if L1 < L2then

39: _{X[1 : 4] ← Generate_X_Cases(N}₂_(j),C_out_(i)) 40: _{Y[1 : 8] ←}

Generate_Y_Cases(Cin, N2(j),Cout(i))

41: _(C_out_{(j), N}₃_{(j),temp) ←}

Add(Cin, N2(j),Cout(i), X,Y)

42: L3← L3+ temp;

43: _{if C}_out_{(j) , NULL then}

44: N₃ _{← N}₂_{(j + 1 : end) k C}_out_{(j) k N}₃ 45: L_{3 ← L}3+ L2− j 46: _else 47: N₃ _{← N}₂_{(j + 1 : end) k N}₃ 48: L_{3 ← L}3+ L2− j − 1 49: _{end if} 50: end if 51: end if 52: _{end if} 53: return N3, L3

(47)

S = 1 Complement N₂ C_in(0) = (0,0) Yes No N₂= N₂ C_in(0) = NULL S N₂ N₁ N₂ C_in(0) N₁ (a) n₁(i):(α_i,β_i) n₂(i):(α_i,β_i) Generate Cases n₁(i) n₂(i) x₁, ... , x₄ Cin(i Generate Cases Add n₃(i):{(α_i,β_i),...} C_out(i) N₃ C_out Addition at iteration i N₂ C_in(0) N₁ (b)

Figure 4.2: Attribute addition and subtraction block diagram. (a) Pre-processing step to select add

(48)

Table 4.2: Y−Cases for addition/subtraction function. Cases y1 y2 y3 y4 y5 y6 y7 y8 α_in_{(i) > β}₁_{(i) − 1 1} α_in_{(i) = β}₁_{(i) − 1} 2 α_in_{(i) < β}₁_{(i) − 1 3} α_in_{(i) > β}₁_(i) 1 α_in_{(i) = β}₁_(i) 2 α_in_{(i) < β}₁_(i) 3 α_in_{(i) > β}₂_{(i) − 1} 1 α_in_{(i) = β}₂_{(i) − 1} 2 α_in_{(i) < β}₂_{(i) − 1} 3 αin(i) > β2(i) 1 α_in_{(i) = β}₂_(i) 2 α_in_{(i) < β}₂_(i) 3 αin(i) > α1(i) 1 α_in_{(i) = α}₁_(i) 2 α_in_{(i) < α}₁_(i) 3 αin(i) > α1(i) − 1 1 α_in_{(i) = α}₁_{(i) − 1} 2 α_in_{(i) < α}₁_{(i) − 1} 3 αin(i) > α2(i) 1 α_in_{(i) = α}₂_(i) 2 α_in_{(i) < α}₂_(i) 3 αin(i) > α2(i) − 1 1 α_in_{(i) = α}₂_{(i) − 1} 2 α_in_{(i) < α}₂_{(i) − 1} 3

4.2 Attribute Modular Addition/Subtraction Algorithm

When N1 and N2 are integers in GF(p), addition and subtraction have to be done modulo p. Modular addition N3 = N1+ N2 mod p and subtraction N3 = N1− N2 mod p can be computed as shown in Algorithm 4.1 with an additional step for reduction modulo p. The pseudo code to reduce a number modulo p shown in Algorithm 4.2.

Line 1 assign one to the variable s to put Algorithm 4.1 in subtraction mode to subtract pwhen needed. Line 3 compare N against p using algorithm explained in Section 3.6.3. If the algorithm returns G = 1 or E = 1, reduction modulo p is needed. The reduction will

(49)

be done by using Algorithm 4.1 (Line 5) and the reduced number will be assigned to N0_. Otherwise, when the comparison result L = 1, no reduction will be needed in this case and the algorithm will assign N to N0_{(Line 7). The reduction algorithm will iterate until the} input number is reduced to a value less than p.

Algorithm 4.2 Pseudo code to reduce a number modulo p (Reduce): N mod p. Input: N, p, LN, Lp, m Output: N0 1: _{s ← 1} Subtraction mode 2: while N ≥ p do 3: _{[G, E, L] = Compare(N,p,L}N,Lp,m) 4: if G = 1 or E = 1 then 5: N0_{← AddSub(N,p,L}_N_,L_p_,s,m) 6: _else 7: N0_{← N} 8: break; 9: _{end if} 10: end while 11: return N0

4.3 Large Integer Addition in Binary Representation

In this desertation we will consider two types of binary adders, the ripple carry adder (RCA) as the baseline and one of the faster parallel prefix adders (PPA) discussed in Section 4.3.2.

4.3.1 Using Ripple Carry Addition Technique

Regardless of software or hardware implementations, large integers are stored in memory in the form of words. Addition or subtraction operations naturally operate on the words in a sequential fashion due to the carry propagation problem. To ensure fast operations and prevent stalls, blocks of words must be accessed and placed in processor’s cache.

Algorithm 4.3 shows the pseudo code for binary addition of m-bits large integers with machine word size is assumed to be w.

Line 2 determines the number of iterations which depends on m and w. Line 6 is the binary addition method or function AddWords which takes two inputs from the input numbers N1and N2. Addition is done ultimately in hardware using the built-in adder in the machine ALU.

Fast prime field arithmetic using novel large integer representation

Fast Prime Field Arithmetic Using Novel Large Integer

Representation

Co-Authorship

Contents

List of Tables

List of Figures

List of Algorithms

List of Functions

List of Abbreviations

Chapter 1

Introduction

1.1

Problem Statement

1.2

Research Objectives

1.3

Contributions

1.4

Dissertation Organization

Chapter 2

Literature Review

2.1

Representation of Integer Numbers

2.2

Integer Arithmetic

2.2.1

Addition/Subtraction

2.2.2

Multiplication

2.2.3

Modular Multiplication: Multiplication Followed by Reduction

2.2.4

Modular Multiplication: Interleaved Multiplication and

Reduc-tion

Chapter 3

Attribute-Based Integer Representation

3.1

Attribute Representation

3.2

Representation of the Number 0

3.3

Attribute Representation of NIST Primes

3.4

Conversion from Binary to Attribute Representation

3.5

Conversion from Attribute Representation to Binary

3.6

Basic Operations using Attribute Representation

3.6.1

Attribute Two’s Complement Algorithm

3.6.2

Attribute Left-Shift Algorithm

3.6.3

Attribute Comparison Algorithm

Chapter 4

Attribute-Based Large Integers Addition

4.1

Attribute Addition/Subtraction Algorithm

4.2

Attribute Modular Addition/Subtraction Algorithm

4.3

Large Integer Addition in Binary Representation

4.3.1

Using Ripple Carry Addition Technique