FPGA Implementation of SubByte & Inverse SubByte …The algorithm takes a plaintext block size of...

4
IJSRD - International Journal for Scientific Research & Development| Vol. 1, Issue 5, 2013 | ISSN (online): 2321-0613 All rights reserved by www.ijsrd.com 1184 AbstractAdvanced encryption standard was accepted as a Federal Information Processing Standard (FIPS) standard. In traditional look up table (LUT) approaches, the unbreakable delay is longer than the total delay of the rest of operations in each round. LUT approach consumes a large area. It is more efficient to apply composite field arithmetic in the SubBytes transformation of the AES algorithm. It not only reduces the complexity but also enables deep sub pipelining such that higher speed can be achieved. Isomorphic mapping can be employed to convert GF(2 8 ) to GF(2 2 ) 2 ) 2 ) ,so that multiplicative inverse can be easily obtained. SubBytes and InvSubBytes transformations are merged using composite field arithmetic. It is most important responsible for the implementation of low cost and high throughput AES architecture. As compared to the typical ROM based lookup table, the presented implementation is both capable of higher speeds since it can be pipelined and small in terms of area occupancy (137/1290 slices on a Spartan III XCS200- 5FPGA). Keywords: Composite field, Isomorphic mapping.. INTRODUCTION I. Cryptography is very much important in the field of data transmission with the rapid growing number of Internet and wireless communication users. Advanced Encryption Standard, (AES) is proposed by National Institute of Standards and Technology, (NIST). The AES is a Federal Information Processing Standard, (FIPS). It is a cryptographic algorithm that is used to protect data. The AES algorithm can be used for both encryption and decryption of data. Encryption converts data or plaintext to ciphertext. Decryption converts ciphertext back to its original form, which is called plaintext. Cryptographic keys of 128, 192, and 256 bits can be used to encrypt and decrypt data in blocks of 128 bits. The main applications of AES algorithm are cell phones smart cards, WWW servers and automated teller machines, and digital video recorders. A lot of architectures have been proposed for the hardware implementations of the AES algorithm. The main idea is to employ composite field arithmetic in the computation of the multiplicative inversion in the SubByte/InvSubBytes transformation of the AES algorithm. So that deep sub pipelining is applied, and hardware complexity is reduced. This paper adopts alternative architecture to achieve small area. High throupu t can be achieved without using LUT and memory so that no unbreakable delay is introduced in the architecture. In traditional look up table (LUT) approaches, the unbreakable delay is longer than the total delay of the rest of operations in each round. Pipelining and subpipeling cannot be applied to LUT approaches. The LUT approach is not suitable for resource constrained use as it consumes a large area. Composite field arithmetic can be used to solve the problem. The process of finding multiplicative inverse in GF(2 8 ) is very complicated by direct method. But, two fields of the same order are said to be isomorphic.so that we can use an isomorphic transform to convert GF(2 8 ) to GF((2 4 ) 2 ) and further to GF( ((2 2 ) 2 ) 2 ). The algorithm takes a plaintext block size of 128 bits, or 16 bytes as input. The key length can be 16, 24, or 32 bytes (128, 192, or 256 bits). The algorithm is referred to as AES- 128, AES-192, or AES-256, depending on the key length. The input to the encryption and decryption algorithms is a single 128-bit block. In FIPS PUB 197, this block is depicted as a 4x4 square matrix of bytes. This block is copied into the state array, which is transformed at each stage of encryption or decryption. After the final stage, state is copied to an output matrix. Similarly, the key is considered as a square matrix of bytes. This key is then expanded into an array of key schedule words. Each byte in the state matrix is an element in Galois Field GF (2 8 ) which is constructed with the irreducible polynomial p(x) = x 8 + x 4 + x 3 + x + 1. The algorithm consists of N rounds, where the number of rounds depends on the key length: 10 rounds for a 16-byte key, 12 rounds for a 24-byte key, and 14 rounds for a 32- byte key . The first N-1 rounds consist of four distinct transformation functions: SubBytes, ShiftRows, MixColumns, and AddRoundKey. The final round contains only three transformations. Initially there is a single transformation (AddRoundKey) before the first round. Each transformation takes one or more 4x4 matrices as input and produces a 4x4 matrix as output SUBBYTE/INVERSE SUBBYTE USING LOOK UP II. TABLE (LUT) The bytes substitution transformation is a non-linear byte substitution that operates independently on each byte of the State matrix using a substitution table (Sbox). [1] Fig. 1: Application of S-box to the Each Byte of the State This S-box which is invertible, and is constructed by two transformations [4] FPGA Implementation of SubByte & Inverse SubByte for AES Algorithm Neethan Elizabeth Abraham 1 1 M.Tech in Communication System 1 Department of Electronics and Communication Engineering 1 Federal Institute of Science and Technology (FISAT), Angamaly, India

Transcript of FPGA Implementation of SubByte & Inverse SubByte …The algorithm takes a plaintext block size of...

Page 1: FPGA Implementation of SubByte & Inverse SubByte …The algorithm takes a plaintext block size of 128 bits, or 16 bytes as input. The key length can be 16, 24, or 32 bytes (128, 192,

IJSRD - International Journal for Scientific Research & Development| Vol. 1, Issue 5, 2013 | ISSN (online): 2321-0613

All rights reserved by www.ijsrd.com 1184

Abstract— Advanced encryption standard was accepted as a

Federal Information Processing Standard (FIPS) standard. In

traditional look up table (LUT) approaches, the unbreakable

delay is longer than the total delay of the rest of operations

in each round. LUT approach consumes a large area. It is

more efficient to apply composite field arithmetic in the

SubBytes transformation of the AES algorithm. It not only

reduces the complexity but also enables deep sub pipelining

such that higher speed can be achieved. Isomorphic mapping

can be employed to convert GF(28) to GF(2

2)

2)

2) ,so that

multiplicative inverse can be easily obtained. SubBytes and

InvSubBytes transformations are merged using composite

field arithmetic. It is most important responsible for the

implementation of low cost and high throughput AES

architecture. As compared to the typical ROM based lookup

table, the presented implementation is both capable of

higher speeds since it can be pipelined and small in terms of

area occupancy (137/1290 slices on a Spartan III XCS200-

5FPGA).

Keywords: Composite field, Isomorphic mapping..

INTRODUCTION I.

Cryptography is very much important in the field of data

transmission with the rapid growing number of Internet and

wireless communication users. Advanced Encryption

Standard, (AES) is proposed by National Institute of

Standards and Technology, (NIST). The AES is a Federal

Information Processing Standard, (FIPS). It is a

cryptographic algorithm that is used to protect data. The

AES algorithm can be used for both encryption and

decryption of data. Encryption converts data or plaintext to

ciphertext. Decryption converts ciphertext back to its

original form, which is called plaintext. Cryptographic keys

of 128, 192, and 256 bits can be used to encrypt and decrypt

data in blocks of 128 bits. The main applications of AES

algorithm are cell phones smart cards, WWW servers and

automated teller machines, and digital video recorders.

A lot of architectures have been proposed for the hardware

implementations of the AES algorithm. The main idea is to

employ composite field arithmetic in the computation of the

multiplicative inversion in the SubByte/InvSubBytes

transformation of the AES algorithm. So that deep sub

pipelining is applied, and hardware complexity is reduced.

This paper adopts alternative architecture to achieve small

area. High throupu t can be achieved without using LUT and

memory so that no unbreakable delay is introduced in the

architecture. In traditional look up table (LUT) approaches,

the unbreakable delay is longer than the total delay of the

rest of operations in each round. Pipelining and subpipeling

cannot be applied to LUT approaches. The LUT approach is

not suitable for resource constrained use as it consumes a

large area. Composite field arithmetic can be used to solve

the problem.

The process of finding multiplicative inverse in GF(28) is

very complicated by direct method. But, two fields of the

same order are said to be isomorphic.so that we can use an

isomorphic transform to convert GF(28) to GF((2

4)

2) and

further to GF( ((22)

2)

2).

The algorithm takes a plaintext block size of 128 bits, or 16

bytes as input. The key length can be 16, 24, or 32 bytes

(128, 192, or 256 bits). The algorithm is referred to as AES-

128, AES-192, or AES-256, depending on the key length.

The input to the encryption and decryption algorithms is a

single 128-bit block. In FIPS PUB 197, this block is

depicted as a 4x4 square matrix of bytes. This block is

copied into the state array, which is transformed at each

stage of encryption or decryption. After the final stage, state

is copied to an output matrix. Similarly, the key is

considered as a square matrix of bytes. This key is then

expanded into an array of key schedule words. Each byte in

the state matrix is an element in Galois Field GF (28) which

is constructed with the irreducible polynomial p(x) = x8 + x

4

+ x3 + x + 1.

The algorithm consists of N rounds, where the number of

rounds depends on the key length: 10 rounds for a 16-byte

key, 12 rounds for a 24-byte key, and 14 rounds for a 32-

byte key . The first N-1 rounds consist of four distinct

transformation functions: SubBytes, ShiftRows,

MixColumns, and AddRoundKey. The final round contains

only three transformations. Initially there is a single

transformation (AddRoundKey) before the first round. Each

transformation takes one or more 4x4 matrices as input and

produces a 4x4 matrix as output

SUBBYTE/INVERSE SUBBYTE USING LOOK UP II.

TABLE (LUT)

The bytes substitution transformation is a non-linear byte

substitution that operates independently on each byte of the

State matrix using a substitution table (Sbox). [1]

Fig. 1: Application of S-box to the Each Byte of the State

This S-box which is invertible, and is constructed by two

transformations [4]

FPGA Implementation of SubByte & Inverse SubByte for AES Algorithm

Neethan Elizabeth Abraham1

1M.Tech in Communication System

1Department of Electronics and Communication Engineering

1Federal Institute of Science and Technology (FISAT), Angamaly, India

Page 2: FPGA Implementation of SubByte & Inverse SubByte …The algorithm takes a plaintext block size of 128 bits, or 16 bytes as input. The key length can be 16, 24, or 32 bytes (128, 192,

FPGA Implementation of SubByte & Inverse SubByte for AES Algorithm (IJSRD/Vol. 1/Issue 5/2013/037)

All rights reserved by www.ijsrd.com 1185

1. Find the multiplicative inverse in the finite field G(28).

2. Apply the following affine transformation (over GF (2))

For 0≤ i ≤ 8 , where bi is the i

th bit of the byte, and ci is the

ith

bit of a byte c with the value {63} or {01100011}. In

matrix form, the affine transformation element of the S-box

can be expressed as in [1]:

Fig. 2: Matrix Notation of S-box

The S-box used in the Sub Bytes transformation is presented

in hexadecimal form in figure. For example, if =S1,1= {f0},

then the substitution value would be determined by the

intersection of the row with index ‘f’ and the column with

index ‘0’ in figure. This would result in S'1, 1 having a value

of {8c}.

Table 1: S-box Values for All 256 Combinations in

Hexadecimal Format

Inverse Byte Substitution Transformation is the inverse of

the byte substitution transformation, in which the inverse S-

Box is applied to each byte of the State. This is obtained by

first applying the inverse of the affine transformation to the

equation and then taking the multiplicative inverse in GF

(28).

Most common method of implementation of the S-Box for

the SubByte operation is that the pre-computed values are

stored in a ROM as lookup table.All 256 values are stored in

a ROM, and the input byte would be wired to the ROM’s

address bus. However, this method has the disadvantage that

the unbreakable delay is very large since ROMs have a fixed

access time for its read and write operation. Such

implementation is expensive in terms of hardware and

consumes large area. So a better way of implementing the S-

Box is to use composite field arithmetic. This S Box has the

Advantage that it occupies small area and pipelining can

also be applied to improve the performance.

Table 2: Inverse S-box Values for All 256 Combinations in

Hexadecimal Format

Fig. 3: Application of the Inverse S-box to Each Byte of the

State

SUBBYTE\INVERSE SUBBYTE USING III.

COMPOSITE FIELD

The steps involved for SubByte & inverse transformation is

shown below

SubByte: Multiplicative Inversion in GF(28) Affine

Transformation.

InvSubBytes: Inverse Affine Transformation

Multiplicative Inversion in GF (28).

The Affine Transformation and its inverse can be

represented as matrix form.

Page 3: FPGA Implementation of SubByte & Inverse SubByte …The algorithm takes a plaintext block size of 128 bits, or 16 bytes as input. The key length can be 16, 24, or 32 bytes (128, 192,

FPGA Implementation of SubByte & Inverse SubByte for AES Algorithm (IJSRD/Vol. 1/Issue 5/2013/037)

All rights reserved by www.ijsrd.com 1186

Fig. 3: Architecture of merged SubBytes/InvSubBytes

Computation of the multiplicative inverse in GF(28) is very

complicated. Isomorphic function δ can be used to map an

element to its composite field,. Then multiplicative inverse

is found in GF(22)

2)

2) and the result will also have to be

converted back to its equivalent in GF(28) via the inverse

isomorphic function, δ-1

. An 8x8 matrix can be used to

represent both δ and δ-1

. Let q be the element in GF(28), then

the isomorphic mapping is δ*q and inverse isomorphic

mapping is δ-1

*q where q7 is the most significant bit and q0

is the least significant bit.[4]

The matrix multiplication can be coverted to logical XOR

operation.The multiplicative inverse in GF(22)

2)

2) can be

calculated as follows[4]

Fig. 4: Multiplicative inverse module

The notations for the modules within the multiplicative

inversion module are below [4]

Fig. 5: Notations for the building blocks within the

multiplicative inversion module.

Each of the above components in GF(24) can be

implemented as follows[4]

Fig. 6: Implementation of Squarer in GF (2

4)

Fig. 7: Implementation of multiplication with constant

Fig. 8: Implementation of multiplication in GF (2

4)

Fig. 9: Implementation of multiplication in GF (2)

Fig. 10: Implementation of multiplication with constant φ

Earlier, authors has already derived a formula to compute

the multiplicative inverse of q (where q is an element of GF

(24)) such that q

-1 = {q3

-1, q2

-1, q1

-1, q0

-1}. [4]

SIMULATION AND IMPLEMENTATION IV.

The merged architecture of SubByte\inversesubbyte is

implemented on a Xilinx Spartan-III XCS200-5FPGA and

simulated by Modelsim6.1. Thus by merging the inverse

isomorphic mapping with the Affine Transformation the

area occupied by the S-Box can be reduced. Therefore, in

the FPGA implementation, the δ-1

and Affine

Transformation module can be combined to reduce the

slices occupied by the S-Box. It would be costly in terms of

Page 4: FPGA Implementation of SubByte & Inverse SubByte …The algorithm takes a plaintext block size of 128 bits, or 16 bytes as input. The key length can be 16, 24, or 32 bytes (128, 192,

FPGA Implementation of SubByte & Inverse SubByte for AES Algorithm (IJSRD/Vol. 1/Issue 5/2013/037)

All rights reserved by www.ijsrd.com 1187

the logic delay to use the S-Box as one continuous path

since deep logic will severely reduce the highest possible

achievable clock frequency.

Fig. 11: Simulation of s box using composite field for

encryption & decryption

The above figure shows the simulation results of sbox and

inverse sbox for encryption and decryption using composite

field arithmetic. There are three input clock,8 bit input value

and ‘1’ or ‘0’ which determines encryption or decryption.’0’

stands for encryption and ‘1’ for decryption. FPGA

implementation is done for both LUT and non LUT

SubByte/inverse SubByte and the synthesis report for both

are analyzed and compared.

Parameter

Without LUT Using LUT

No of slices 137/1920 162/1920

No of slice flip flops 231/3840 33/3840

Maximum frequency 226.706MHz 184.298MHz

Minimum period 4.411ns 5.42ns

Table 3: comparison of Sbox using LUT and without LUT

CONCLUSION V.

In traditional look up table (LUT) approaches, the

unbreakable delay is longer than the total delay of the rest of

operations in each round. LUT approach is not suitable for

resource constrained use for it costs a large area. Composite

field arithmetic has been introduced to solve the problem.

The multiplicative inverse in GF (28) is very complicated by

direct computation. Merging also reduces the area and

increases the throuput

Presented implementation is capable of higher speeds as

compared to the typical ROM based lookup table. It can be

pipelined and small in terms of area occupancy (137/1290

slices on a Spartan III XCS200-5FPGA). This compact and

high speed architecture allows the S-Box to be used in both

area limited and demanding throughput AES chips for

various applications, ranging from small smart cards to high

speed servers.

REFERENCES

[1] Advanced Encryption Standard (AES), FIPS PUB

197, Nov. 26, 2001, Federal Information Processing

Standards publication 197.

[2] X. Zhang , K. K. Parhi, "High-speed VLSI

architectures for the AES algorithm", IEEE

Transactions on Very Large Scale Integration (VLSI)

Systems, v.12 n.9, p.957-967, September 2004.

[3] H. Kuo, and I. Verbauwhede. "Architecture

optimization for a 1.82Gbit/s VLSI implementation

of the AES Rijndael algorithm".Proc. 3rd

Int.CHES

2001, May 2001, pp. 51-64.

[4] Practical Implementation of Rijndael S-Box Using

Combinational Logic Edwin NC Mui Custom R & D

Engineer,Texco Enterprise Ptd. Ltd.

[5] On the Optimum Constructions of Composite Field

for the AES Algorithm Xinmiao Zhang, Member,

IEEE, and Keshab K. Parhi, Fellow, IEEE

[6] A High-Throughput Cost-Effective ASIC

Implementation ofthe AES Algorithm978-1-4244-

3870-9/09/$25.00 ©2009 IEEE