Mapping the AES Algorithm to MorphoSys Architecture

Post on 23-Jan-2016

42 views 0 download

description

Mapping the AES Algorithm to MorphoSys Architecture. Ye Tang Aug 2001. Overview. Part I: AES Algorithm Introduction Part II: Mapping to MorphoSys Part III: Performance Evaluation. Part I: AES Introduction. What Is AES?. A dvanced E ncryption S tandard - PowerPoint PPT Presentation

Transcript of Mapping the AES Algorithm to MorphoSys Architecture

Mapping the AES Algorithmto MorphoSys Architecture

Ye Tang

Aug 2001

MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine

Overview

• Part I: AES Algorithm Introduction• Part II: Mapping to MorphoSys• Part III: Performance Evaluation

Part I: AES Introduction

MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine

What Is AES?

• Advanced Encryption Standard• Next generation cryptographic algorithm for use by

U.S. Government organizations to protect sensitive (unclassified) information.

• Will hopefully replace the current standard, DES (Data Encryption Standard) and Triple DES, sooner or later.

MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine

AES Development Timeline

• National Institute of Standards and Technology (NIST) worked with the whole industry and the cryptographic community to develop AES– Jan 1997: NIST announced the initiation of the

AES development – Aug 1998: NIST announced that fifteen algorithms

were selected as candidates– Apr 1999: NIST selected five algorithms from the

fifteen as the AES finalist– Oct 2000: NIST announced that Rijndael has been

selected for the AES.– Feb 2001: Draft FIPS for the AES published for

public comments.– May 2001: Comment period closes.– ? 2001: AES FIPS becomes official.

MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine

Rijndael Overview

• A symmetric block cipher developed by two Belgium cryptology experts, Joan Daemen and Vincent Rijmen

• Apply to data blocks of 128 bits, using cipher keys with lengths of 128, 192, and 256 bits

• Very good performance in both hardware and software across a wide range of computing

• Very high security level. Even with future advances in technology, it has the potential to remain secure well beyond twenty years.

MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine

Terms Used in Rijndael

• Round– Rijndael is an iterated block cipher. Every iteration is called

Round. Rijndael has 10, 12, or 14 Rounds when the Cipher Key size is 128, 192, or 256 bits, respectively.

• Cipher Key– Original secret key used for encryption or decryption. Also

shortened as “Key”. The size can be 128, 192, or 256 bits.

• Round Key– A series of keys derived from the Cipher Key. Every Round

needs a Round Key, Round Key’s size can only be 128 bits.

• Key Expansion– the routine used to generate all Round Keys from the Cipher

Key.

MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine

Basic Structure of Encryption

• Initialization– Key Expansion: Get all necessary Round Keys from the Cipher

Key.

• Data Processing1. Initial step

1. RoundKeyAddition()

2. Intermediate Rounds (10, 12, or 14 Rounds)1. SubBytes()2. ShiftRows()3. MixColumns()4. RoundKeyAddition()

2. Final Round1. SubBytes()2. ShiftRows()3. RoundKeyAddition()

A BSMA BSMA … BSA

MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine

Basic Structure of Decryption

• Initialization– Key Expansion: Get all necessary Round Keys from the Cipher

Key. The detailed procedure is different than Encryption’s.

• Data Processing (Replace with inverse functions)1. Initial step

1. InvRoundKeyAddition()

2. Intermediate Rounds (10, 12, or 14 Rounds)1. InvSubBytes()2. InvShiftRows()3. InvMixColumns()4. InvRoundKeyAddition()

3. Final Round1. InvSubBytes()2. InvShiftRows()3. InvRoundKeyAddition()

same sequence:

A BSMA BSMA … BSA

MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine

Math Background (1)

• Galois Field GF(28)– A byte b, consisting of bits b7 b6 b5 b4 b3 b2 b1 b0 is

considered as a polynomial with coefficient in {0,1}:

b7x7 + b6x6 + b5x5 + b4x4 + b3x3 + b2x2 + b1x1 + b0

• Addition– Coefficients are given by the sum of the coefficients of the

two terms modulo 2

Example: ’57’+’83’=‘D4’ (hexadecimal)

(x6 + x4 + x2 + x + 1) + (x7 + x +1) = x7 + x6 + x4 + x2 – Addition corresponds with bitwise XOR: ’57’ ’83’=‘D4’

• Subtraction (i.e., inverse of addition)– the same as addition (also bitwise XOR)

MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine

Math Background (2)

• Multiplication– Multiplication in GF(28) corresponds with multiplication of

polynomials modulo an irreducible binary polynomial of degree 8. For Rijndael, this irreducible polynomial is called m(x) and given by

m(x) = x8 + x4 + x3 + x + 1

Example: ’57’ ’83’=‘C1’

(x6 + x4 + x2 + x + 1)(x7 + x +1) = x13 + x11 + x9 + x8 + x7 + x7 + x5 + x3 + x2 + x + x6 + x4 + x2 + x +1 = x13 + x11 + x9 + x8 + x6 + x5 + x4 + x3 + 1

x13 + x11 + x9 + x8 + x6 + x5 + x4 + x3 + 1 modulo x8 + x4 + x3 + x + 1 = x7 + x6 + 1

result

MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine

Math Background (3)

• Multiplication by x (xtime operation)– result before modulo m(x) is:

b7x8 + b6x7 + b5x6 + b4x5 + b3x4 + b2x3 + b1x2 + b0x

• If b7 = 0, no reduction;

• If b7 = 1, m(x) must be subtracted (i.e., XORed)

– In other words, b(x) * x can be implemented as a one-bit left shift and a subsequent conditional XOR with ‘1B’.

– Multiplication by x is denoted by a = xtime(b)– Example:

’57’ * ’02’ = xtime(’57’) = (0)10101110 = ‘AE’’57’ * ’04’ = xtime(xtime(‘57’)) = xtime(‘AE’) = (1)01011100 ^ ‘1B’ = ’47’

MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine

Math Background (4)

• How to do multiplication in Rijndael? (e.g. ’57’ * ’13’)– 1st Approach: use table-lookup (two tables: Log & Alog)

mul(’57’,’13’) = Alogtable[(Logtable[’57’]+Logtable[’13’])%255]= Alogtable[(98+14)%255] = Alogtable[112] = 254 = ‘FE’

• logarithmic table and anti-logarithmic table are used

– 2nd Approach: use xtime operation’57’ * ’13’ = ’57’ * (’01’ ^ ’02’ ^ ’10’)

= ’57’ * ’01’ ^ ’57’ * ’02’ ^ ’57’ * ’10’ = ’57’ ^ ‘AE’ ^ ’07’ = ‘FE’

Notice:xtime can be implemented directly by hardware. But in MorphoSys, it is still implemented by table lookup. The difference with the 1st approach is it needs only one table.

MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine

Math Background (5)

• Polynomial multiplication

a(x) = a3x3 + a2x2 + a1x1 + a0 b(x) = b3x3 + b2x2 + b1x1 + b0

c(x) = a(x) * b(x) = (c6x6 + c5x5 + c4x4 + c3x3 + c2x2 + c1x1 + c0)

d(x) = c(x) mod (x4 + 1) Final Result

After some simplification, d(x) can be represented as:

)()()( xbxaxd

3

2

1

0

0123

3012

2301

1230

3

2

1

0

b

b

b

b

aaaa

aaaa

aaaa

aaaa

d

d

d

d

MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine

Basic Function 1a: SubBytes()

• ‘b’ = SubBytes(‘a’)– Substitute ‘a’ with ‘b’ which is the element at address ‘a’ in

table S-box.– table S-box is a predefined 256-byte constant table.– Example: SubBytes(’00’) = ’63’ (’63’ is the first element in S-

box)

MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine

Basic Function 1b: InvSubBytes()

• Similar to SubBytes()• Only difference: the table is “Inv S-box”, another

predefined 256-byte constant table.

MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine

Basic Function 2a: ShiftRows()

• Each row is shifted over different offsets: Row 0, 1, 2, 3 will be shifted over 0, 1, 2, 3 byte(s), respectively.

The number represents the position of the corresponding byte

128416

731511

214106

13951

161284

151173

141062

13951

MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine

Basic Function 2b: InvShiftRows()

• Similar to ShiftRows()• Row 0, 1, 2, 3 will be shifted over 0, 3, 2, 1 byte(s),

respectively.

161284

151173

141062

13951

128416

731511

214106

13951

Notice the positions are restored to original ones if InvShiftRows() is applied after ShiftRows()

MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine

Basic Function 3a: MixColumns()

• The following polynomial multiplication is performed for every column:

where c(x) = ’03’x3 + ’01’x2 + ’01’x + ’02’

i.e.,

3

2

0

3

2

0

11

02010103

03020101

01030201

01010302

a

a

a

a

d

d

d

d

)()()( xaxcxd

)'02(')'03('

)'03(')'02('

)'03(')'02('

)'03(')'02('

32103

32102

32101

32100

aaaad

aaaad

aaaad

aaaad

MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine

Basic Function 3b: InvMixColumns()

• Similar to MixColumns()• Uses c-1(x) instead of c(x)

c-1(x) = ’0B’x3 + ’0D’x2 + ’09’x + ’0E’

MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine

xtime Approach for MixColumns()…

d0 = ’02’*a0 + ’03’*a1 + ’01’ *a2 + ’01’ *a3 = a0 + (a0 + a1 + a2 + a3) + ’02’*(a0 + a1)

3

2

0

3

2

0

11

02010103

03020101

01030201

01010302

a

a

a

a

d

d

d

d

tmp = a0 ^ a1 ^ a2 ^ a3;

tm = a0 ^ a1; tm = xtime(tm); a0 ^ = tm ^ tmp;

tm = a1 ^ a2; tm = xtime(tm); a1 ^ = tm ^ tmp;

tm = a2 ^ a3; tm = xtime(tm); a2 ^ = tm ^ tmp;

tm = a3 ^ a0; tm = xtime(tm); a3 ^ = tm ^ tmp;

tmp = a0 ^ a1 ^ a2 ^ a3;

tm = a0 ^ a1; tm = xtime(tm); a0 ^ = tm ^ tmp;

tm = a1 ^ a2; tm = xtime(tm); a1 ^ = tm ^ tmp;

tm = a2 ^ a3; tm = xtime(tm); a2 ^ = tm ^ tmp;

tm = a3 ^ a0; tm = xtime(tm); a3 ^ = tm ^ tmp;

1st element

notice a0 + a0 = 0 (XOR)

MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine

… and InvMixColumns()

d(x) = c-1(x) * a(x) where c-1(x) = ’0B’x3 + ’0D’x2 + ’09’x + ’0E’ d0 = ’0E’*a0 + ’0B’*a1 + ’0D’*a2 + ’09’*a3 = a0 + (a0 + a1 + a2 + a3) + ’02’*(a0 + a1) + ’04’*(a0 + a2) + ’08’*(a0 + a1 + a2 + a3)

tmp1 = a0 ^ a1 ^ a2 ^ a3; tmp2=xtime(xtime(xtime(tmp1)));

tm1 = a0 ^ a2; tm2 = a0 ^ a1; tm1 = xtime(xtime(tm1)); tm2 = xtime(tm2);

a0 ^ = tm1 ^ tm2 ^ tmp1 ^ tmp2; tm1 = a1 ^ a3; tm2 = a1 ^ a2; tm1 = xtime(xtime(tm1)); tm2 = xtime(tm2);

a1 ^ = tm1 ^ tm2 ^ tmp1 ^ tmp2; tm1 = a2 ^ a0; tm2 = a2 ^ a3; tm1 = xtime(xtime(tm1)); tm2 = xtime(tm2);

a2 ^ = tm1 ^ tm2 ^ tmp1 ^ tmp2; tm1 = a3 ^ a1; tm2 = a3 ^ a0; tm1 = xtime(xtime(tm1)); tm2 = xtime(tm2);

a3 ^ = tm1 ^ tm2 ^ tmp1 ^ tmp2;

tmp1 = a0 ^ a1 ^ a2 ^ a3; tmp2=xtime(xtime(xtime(tmp1)));

tm1 = a0 ^ a2; tm2 = a0 ^ a1; tm1 = xtime(xtime(tm1)); tm2 = xtime(tm2);

a0 ^ = tm1 ^ tm2 ^ tmp1 ^ tmp2; tm1 = a1 ^ a3; tm2 = a1 ^ a2; tm1 = xtime(xtime(tm1)); tm2 = xtime(tm2);

a1 ^ = tm1 ^ tm2 ^ tmp1 ^ tmp2; tm1 = a2 ^ a0; tm2 = a2 ^ a3; tm1 = xtime(xtime(tm1)); tm2 = xtime(tm2);

a2 ^ = tm1 ^ tm2 ^ tmp1 ^ tmp2; tm1 = a3 ^ a1; tm2 = a3 ^ a0; tm1 = xtime(xtime(tm1)); tm2 = xtime(tm2);

a3 ^ = tm1 ^ tm2 ^ tmp1 ^ tmp2;

more xtime operations

MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine

Basic Function 4a: AddRoundKey()

• A 16-byte Round Key is added (i.e., bitwise XORed) to the 16-byte data block. Each data byte is added with the same position Round Key byte.

MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine

Basic Function 4b: InvAddRoundKey()

• Exactly the same as AddRoundKey()– This is because subtraction is the same as addition

MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine

Key Expansion for Encryption

• Key Expansion is the process of generating Round Keys from the Cipher Key.Length of Round Keys = Block Length * (Number of Rounds +1)

e.g., If key size is 128 bits, 10 Rounds are needed, thus 128 * 11 = 1408 bits are needed for Round keys.

• Need table lookup operation– tables needed:

• S-box: 256 bytes

• Rcon: 30 bytes

• Pseudo Code: next slide

MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine

Pseudo Code for Key Expansion

KeyExpansion (Key[4*Nk], W[4*(Nr+1)], Nk){ for ( i = 0; i < Nk; i++)

W[i] = (Key[4*i], Key[4*i+1], Key[4*i+2], Key[4*i+3]); for ( i = Nk; i < 4*(Nr+1); i++) {

temp = W[i-1];if ( i % Nk == 0)

temp = SubWord(RotWord(temp)) ^ Rcon[i/Nk];else if ( Nk = 8 and i % Nk == 4)

temp = SubWord(temp);W[i] = W[i-Nk] ^ temp;

}} SubWord (W(a, b, c, d)){ return W(S-box(a), S-box(b), S-box(c), S-box(d)); } RotWord (W(a, b, c, d))

{return W(b, c, d, a); }

MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine

Key Expansion for Decryption

• First, apply the same expansion procedure as in Encryption.

• Second, apply InvMixColumns() to every Round Key except the first and last one.– There are (# of Rounds+1) Round Keys in total. So (# of

Rounds-1) Round Keys will be applied to InvMixColumns().

• Tables needed:– S-box: 256 bytes– Rcon: 30 bytes– Log: 256 bytes – Alog: 256 bytes

for InvMixColumns(), do not use xtime approach here because Key Expansion is done by TinyRISC and there is enough memory to save tables

Part II: Mapping to MorphoSys

MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine

Important upgrades in M2

• M2 is the next generation MorphoSys architecture• Some new features important to AES implementation

– Every RC can do table lookup operation locally. This is realized by an embedded 512-byte memory in each RC.

– The number of registers in each RC is increased from 4 to 8.

MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine

Key Expansion Implementation

• Completely done by the general-purpose RISC processor in MorphoSys: TinyRISC

• The resulted Round Keys are saved in main (external) memory for future use

MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine

Issues in Data Processing Part

• RC Array Partition• How to Do (Inv)ShiftRows()• About (Inv)MixColumns()

MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine

Issue 1: RC Array Partition

• Compute in parallel– 4 data blocks, or 64 bytes, are processed at the same time

by 64 RCs– RC Array partition: choose the scenario on the right.

• though not intuitive, it provides natural data loading/storing order.

• Thanks to the efficient interconnection network in MorphoSys, the subsequent data move is still kept simple.

0

(4x4)

1

(4x4)

2

(4x4)

3

(4x4)

0 1 2 3 good!

8x88x8

MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine

Issue 2: How to Do (Inv)ShiftRows()?

• In intermediate Rounds, (Inv)ShiftRows() is only a position adjustment for the subsequent (Inv)MixColumns().– It is desirable to have every RC save the data needed for

(Inv)MixColumns, i.e., the data in the same column, during (Inv)ShiftRows(). This will make (Inv)MixColumns() faster.

• But in the final round, no (Inv)MixColumns() follows ShiftRows(). So a simpler data move strategy is used there.

MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine

Data Move for ShiftRows()in intermediate Rounds

• Eight steps are needed. The goal is shown below:

Column 0 Column 1

r0 r1 r2 r3 r0 r1 r2 r3

1 – – – 9 – – –

2 – – – 10 – – –

3 – – – 11 – – –

4 – – – 12 – – –

5 – – – 13 – – –

6 – – – 14 – – –

7 – – – 15 – – –

8 – – – 16 – – –

Row 0

Row 1

Row 2

Row 3

Row 4

Row 5

Row 6

Row 7

Column 0 Column 1

r0 r1 r2 r3 r0 r1 r2 r3

1 6 11 16 9 14 3 8

1 6 11 16 9 14 3 8

1 6 11 16 9 14 3 8

1 6 11 16 9 14 3 8

5 10 15 4 13 2 7 12

5 10 15 4 13 2 7 12

5 10 15 4 13 2 7 12

5 10 15 4 13 2 7 12

only one block is shown here order is not important

MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine

Intermediate ShiftRows(): Step 1

Column 0 Column 1

r0 r1 r2 r3 r0 r1 r2 r3

1 – – – 9 – – –

2 – – – 10 – – –

3 – – – 11 – – –

4 5 – – 12 13 – –

5 – – – 13 – – –

6 1 – – 14 9 – –

7 – – – 15 – – –

8 – – – 16 – – –

Row 0

Row 1

Row 2

Row 3

Row 4

Row 5

Row 6

Row 7

31

40

51

00 , rrrr Express Lane, Row mode

ikr means rk in Row i

MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine

Intermediate ShiftRows(): Step 2

Column 0 Column 1

r0 r1 r2 r3 r0 r1 r2 r3

1 – – – 9 – – –

2 7 – – 10 15 – –

3 – – – 11 – – –

4 5 – – 12 13 – –

5 – – – 13 – – –

6 1 – – 14 9 – –

7 – – – 15 – – –

8 3 – – 16 11 – –

Row 0

Row 1

Row 2

Row 3

Row 4

Row 5

Row 6

Row 7

11

60

71

20 , rrrr Express Lane, Row mode

ikr means rk in Row i

MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine

Intermediate ShiftRows(): Step 3, 4

Column 0 Column 1

r0 r1 r2 r3 r0 r1 r2 r3

1 – – – 9 – – –

2 7 10 15 10 15 2 7

3 – – – 11 – – –

4 5 – – 12 13 – –

5 – – – 13 – – –

6 1 – – 14 9 – –

7 – – – 15 – – –

8 3 16 11 16 11 8 3

Row 0

Row 1

Row 2

Row 3

Row 4

Row 5

Row 6

Row 7

13

01

03

11

12

00

02

10 , | , rrrrrrrr Mux A = Left (r), Column mode

ikr means rk in Column i

all seeds are ready

MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine

Intermediate ShiftRows(): Step 5

Column 0 Column 1

r0 r1 r2 r3 r0 r1 r2 r3

1 – – – 9 – – –

2 7 10 15 10 15 2 7

3 – – – 11 – – –

4 5 – – 12 13 – –

5 – – – 13 – – –

6 1 – – 14 9 – –

7 – – – 15 – – –

8 3 16 11 16 11 8 3

Row 0

Row 1

Row 2

Row 3

Row 4

Row 5

Row 6

Row 7

ikr

rrrr 3,2,1,00

50

7,6,5,40

30 ,

Express Lane, Row modeikr means rk in Row i

Column 0 Column 1

r0 r1 r2 r3 r0 r1 r2 r3

6 – – – 14 – – –

6 7 10 15 14 15 2 7

6 – – – 14 – – –

6 5 – – 14 13 – –

4 – – – 12 – – –

4 1 – – 12 9 – –

4 – – – 12 – – –

4 3 16 11 12 11 8 3

seeds

MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine

Intermediate ShiftRows(): Step 6

Column 0 Column 1

r0 r1 r2 r3 r0 r1 r2 r3

6 – – – 14 – – –

6 7 10 15 14 15 2 7

6 – – – 14 – – –

6 5 – – 14 13 – –

4 – – – 12 – – –

4 1 – – 12 9 – –

4 – – – 12 – – –

4 3 16 11 12 11 8 3

Row 0

Row 1

Row 2

Row 3

Row 4

Row 5

Row 6

Row 7

3,2,1,01

51

7,6,5,41

31 , rrrr Express Lane, Row mode

ikr means rk in Row i

Column 0 Column 1

r0 r1 r2 r3 r0 r1 r2 r3

6 1 – – 14 9 – –

6 1 10 15 14 9 2 7

6 1 – – 14 9 – –

6 1 – – 14 9 – –

4 5 – – 12 13 – –

4 5 – – 12 13 – –

4 5 – – 12 13 – –

4 5 16 11 12 13 8 3

MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine

Intermediate ShiftRows(): Step 7

Column 0 Column 1

r0 r1 r2 r3 r0 r1 r2 r3

6 1 – – 14 9 – –

6 1 10 15 14 9 2 7

6 1 – – 14 9 – –

6 1 – – 14 9 – –

4 5 – – 12 13 – –

4 5 – – 12 13 – –

4 5 – – 12 13 – –

4 5 16 11 12 13 8 3

Row 0

Row 1

Row 2

Row 3

Row 4

Row 5

Row 6

Row 7

3,2,1,02

72

7,6,5,42

12 , rrrr Express Lane, Row mode

ikr means rk in Row i

Column 0 Column 1

r0 r1 r2 r3 r0 r1 r2 r3

6 1 16 – 14 9 8 –

6 1 16 15 14 9 8 7

6 1 16 – 14 9 8 –

6 1 16 – 14 9 8 –

4 5 10 – 12 13 2 –

4 5 10 – 12 13 2 –

4 5 10 – 12 13 2 –

4 5 10 11 12 13 2 3

MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine

Intermediate ShiftRows(): Step 8

Column 0 Column 1

r0 r1 r2 r3 r0 r1 r2 r3

6 1 16 – 14 9 8 –

6 1 16 15 14 9 8 7

6 1 16 – 14 9 8 –

6 1 16 – 14 9 8 –

4 5 10 – 12 13 2 –

4 5 10 – 12 13 2 –

4 5 10 – 12 13 2 –

4 5 10 11 12 13 2 3

Row 0

Row 1

Row 2

Row 3

Row 4

Row 5

Row 6

Row 7

3,2,1,03

73

7,6,5,43

13 , rrrr Express Lane, Row mode

ikr means rk in Row i

Column 0 Column 1

r0 r1 r2 r3 r0 r1 r2 r3

6 1 16 11 14 9 8 3

6 1 16 11 14 9 8 3

6 1 16 11 14 9 8 3

6 1 16 11 14 9 8 3

4 5 10 15 12 13 2 7

4 5 10 15 12 13 2 7

4 5 10 15 12 13 2 7

4 5 10 15 12 13 2 7

MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine

Data Move for InvShiftRows()in intermediate Rounds

Column 0 Column 1

r0 r1 r2 r3 r0 r1 r2 r3

1 – – – 9 – – –

2 – – – 10 – – –

3 – – – 11 – – –

4 – – – 12 – – –

5 – – – 13 – – –

6 – – – 14 – – –

7 – – – 15 – – –

8 – – – 16 – – –

Row 0

Row 1

Row 2

Row 3

Row 4

Row 5

Row 6

Row 7

Column 0 Column 1

r0 r1 r2 r3 r0 r1 r2 r3

1 14 11 8 9 6 3 16

1 14 11 8 9 6 3 16

1 14 11 8 9 6 3 16

1 14 11 8 9 6 3 16

5 2 15 12 13 10 7 4

5 2 15 12 13 10 7 4

5 2 15 12 13 10 7 4

5 2 15 12 13 10 7 4

• Similarly, eight steps are needed. The goal is:

only one block is shown here order is not important

MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine

Intermediate InvShiftRows(): Step 1

Column 0 Column 1

r0 r1 r2 r3 r0 r1 r2 r3

1 – – – 9 – – –

2 5 – – 10 13 – –

3 – – – 11 – – –

4 – – – 12 – – –

5 – – – 13 – – –

6 – – – 14 – – –

7 – – – 15 – – –

8 1 – – 16 9 – –

Row 0

Row 1

Row 2

Row 3

Row 4

Row 5

Row 6

Row 7

11

40

71

00 , rrrr Express Lane, Row mode

ikr means rk in Row i

MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine

Intermediate InvShiftRows(): Step 2

Column 0 Column 1

r0 r1 r2 r3 r0 r1 r2 r3

1 – – – 9 – – –

2 5 – – 10 13 – –

3 – – – 11 – – –

4 7 – – 12 15 – –

5 – – – 13 – – –

6 3 – – 14 11 – –

7 – – – 15 – – –

8 1 – – 16 9 – –

Row 0

Row 1

Row 2

Row 3

Row 4

Row 5

Row 6

Row 7

51

20

31

60 , rrrr Express Lane, Row mode

ikr means rk in Row i

MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine

Intermediate InvShiftRows(): Step 3, 4

Column 0 Column 1

r0 r1 r2 r3 r0 r1 r2 r3

1 – – – 9 – – –

2 5 – – 10 13 – –

3 – – – 11 – – –

4 7 12 15 12 15 4 7

5 – – – 13 – – –

6 3 14 11 14 11 6 3

7 – – – 15 – – –

8 1 – – 16 9 – –

Row 0

Row 1

Row 2

Row 3

Row 4

Row 5

Row 6

Row 7

13

01

03

11

12

00

02

10 , | , rrrrrrrr Mux A = Left (r), Column mode

ikr means rk in Column i

all seeds are ready

MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine

Intermediate InvShiftRows(): Step 5

Column 0 Column 1

r0 r1 r2 r3 r0 r1 r2 r3

1 – – – 9 – – –

2 5 – – 10 13 – –

3 – – – 11 – – –

4 7 12 15 12 15 4 7

5 – – – 13 – – –

6 3 14 11 14 11 6 3

7 – – – 15 – – –

8 1 – – 16 9 – –

Row 0

Row 1

Row 2

Row 3

Row 4

Row 5

Row 6

Row 7

3,2,1,00

70

7,6,5,40

10 , rrrr Express Lane, Row mode

ikr means rk in Row i

Column 0 Column 1

r0 r1 r2 r3 r0 r1 r2 r3

8 – – – 16 – – –

8 5 – – 16 13 – –

8 – – – 16 – – –

8 7 12 15 16 15 4 7

2 – – – 10 – – –

2 3 14 11 10 11 6 3

2 – – – 10 – – –

2 1 – – 10 9 – –

seeds

MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine

Intermediate InvShiftRows(): Step 6

Column 0 Column 1

r0 r1 r2 r3 r0 r1 r2 r3

8 – – – 16 – – –

8 5 – – 16 13 – –

8 – – – 16 – – –

8 7 12 15 16 15 4 7

2 – – – 10 – – –

2 3 14 11 10 11 6 3

2 – – – 10 – – –

2 1 – – 10 9 – –

Row 0

Row 1

Row 2

Row 3

Row 4

Row 5

Row 6

Row 7

3,2,1,01

71

7,6,5,41

11 , rrrr Express Lane, Row mode

ikr means rk in Row i

Column 0 Column 1

r0 r1 r2 r3 r0 r1 r2 r3

8 1 – – 16 9 – –

8 1 – – 16 9 – –

8 1 – – 16 9 – –

8 1 12 15 16 9 4 7

2 5 – – 10 13 – –

2 5 14 11 10 13 6 3

2 5 – – 10 13 – –

2 5 – – 10 13 – –

MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine

Intermediate InvShiftRows(): Step 7

Column 0 Column 1

r0 r1 r2 r3 r0 r1 r2 r3

8 1 – – 16 9 – –

8 1 – – 16 9 – –

8 1 – – 16 9 – –

8 1 12 15 16 9 4 7

2 5 – – 10 13 – –

2 5 14 11 10 13 6 3

2 5 – – 10 13 – –

2 5 – – 10 13 – –

Row 0

Row 1

Row 2

Row 3

Row 4

Row 5

Row 6

Row 7

3,2,1,02

52

7,6,5,42

32 , rrrr Express Lane, Row mode

ikr means rk in Row i

Column 0 Column 1

r0 r1 r2 r3 r0 r1 r2 r3

8 1 14 – 16 9 6 –

8 1 14 – 16 9 6 –

8 1 14 – 16 9 6 –

8 1 14 15 16 9 6 7

2 5 12 – 10 13 4 –

2 5 12 11 10 13 4 3

2 5 12 – 10 13 4 –

2 5 12 – 10 13 4 –

MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine

Intermediate InvShiftRows(): Step 8

Column 0 Column 1

r0 r1 r2 r3 r0 r1 r2 r3

8 1 14 – 16 9 6 –

8 1 14 – 16 9 6 –

8 1 14 – 16 9 6 –

8 1 14 15 16 9 6 7

2 5 12 – 10 13 4 –

2 5 12 11 10 13 4 3

2 5 12 – 10 13 4 –

2 5 12 – 10 13 4 –

Row 0

Row 1

Row 2

Row 3

Row 4

Row 5

Row 6

Row 7

3,2,1,03

53

7,6,5,43

33 , rrrr Express Lane, Row mode

ikr means rk in Row i

Column 0 Column 1

r0 r1 r2 r3 r0 r1 r2 r3

8 1 14 11 16 9 6 3

8 1 14 11 16 9 6 3

8 1 14 11 16 9 6 3

8 1 14 11 16 9 6 3

2 5 12 15 10 13 4 7

2 5 12 15 10 13 4 7

2 5 12 15 10 13 4 7

2 5 12 15 10 13 4 7

MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine

Data Move for ShiftRows()in Final Round

Column 0 Column 1

r0 r1 r2 r3 r0 r1 r2 r3

1 – – – 9 – – –

2 – – – 10 – – –

3 – – – 11 – – –

4 – – – 12 – – –

5 – – – 13 – – –

6 – – – 14 – – –

7 – – – 15 – – –

8 – – – 16 – – –

Row 0

Row 1

Row 2

Row 3

Row 4

Row 5

Row 6

Row 7

Column 0 Column 1

r0 r1 r2 r3 r0 r1 r2 r3

1 – – – 9 – – –

6 – – – 14 – – –

11 – – – 3 – – –

16 – – – 8 – – –

5 – – – 13 – – –

10 – – – 2 – – –

15 – – – 7 – – –

4 – – – 12 – – –

• Five steps are needed. The goal is:

only one block is shown here only use r0 for result

MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine

Final ShiftRows(): Step 1, 2

Column 0 Column 1

r0 r1 r2 r3 r0 r1 r2 r3

1 – – – 9 – – –

2 6 – – 10 14 – –

3 – – – 11 – – –

4 8 – – 12 16 – –

5 – – – 13 – – –

6 2 – – 14 10 – –

7 – – – 15 – – –

8 4 – – 16 12 – –

Row 0

Row 1

Row 2

Row 3

Row 4

Row 5

Row 6

Row 7

Express Lane, Row mode

ikr means rk in Row i

31

70

71

30

11

50

51

10 , | , rrrrrrrr

MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine

Final ShiftRows(): Step 3, 4

Mux A = Left (r), Column modeikr means rk in Column i

13

01

03

11

12

00

02

10 , | , rrrrrrrr

Column 0 Column 1

r0 r1 r2 r3 r0 r1 r2 r3

1 – – – 9 – – –

2 6 – – 10 14 – –

3 – – – 11 – – –

4 8 – – 12 16 – –

5 – – – 13 – – –

6 2 – – 14 10 – –

7 – – – 15 – – –

8 4 – – 16 12 – –

Row 0

Row 1

Row 2

Row 3

Row 4

Row 5

Row 6

Row 7

Column 0 Column 1

r0 r1 r2 r3 r0 r1 r2 r3

1 – – – 9 – – –

2 6 – – 10 14 – –

3 – 11 – 11 – 3 –

4 8 – 16 12 16 – 8

5 – – – 13 – – –

6 2 – 10 14 10 – 2

7 – 15 – 15 – 7 –

8 4 – – 16 12 – –

MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine

Final ShiftRows(): Step 5

Column 0 Column 1

r0 r1 r2 r3 r0 r1 r2 r3

1 – – – 9 – – –

2 6 – – 10 14 – –

3 – 11 – 11 – 3 –

4 8 – 16 12 16 – 8

5 – – – 13 – – –

6 2 – 10 14 10 – 2

7 – 15 – 15 – 7 –

8 4 – – 16 12 – –

r0 r0

r0 r1

r0 r2

r0 r3

r0 r0

r0 r3

r0 r2

r0 r1

Row 0

Row 1

Row 2

Row 3

Row 4

Row 5

Row 6

Row 7

Column 0 Column 1

r0 r1 r2 r3 r0 r1 r2 r3

1 – – – 9 – – –

6 – – – 14 – – –

11 – – – 3 – – –

16 – – – 8 – – –

5 – – – 13 – – –

10 – – – 2 – – –

15 – – – 7 – – –

4 – – – 12 – – –

Row mode

MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine

Data Move for InvShiftRows()in Final Round

Column 0 Column 1

r0 r1 r2 r3 r0 r1 r2 r3

1 – – – 9 – – –

2 – – – 10 – – –

3 – – – 11 – – –

4 – – – 12 – – –

5 – – – 13 – – –

6 – – – 14 – – –

7 – – – 15 – – –

8 – – – 16 – – –

Row 0

Row 1

Row 2

Row 3

Row 4

Row 5

Row 6

Row 7

Column 0 Column 1

r0 r1 r2 r3 r0 r1 r2 r3

1 – – – 9 – – –

14 – – – 6 – – –

11 – – – 3 – – –

8 – – – 16 – – –

5 – – – 13 – – –

2 – – – 10 – – –

15 – – – 7 – – –

12 – – – 4 – – –

• Five steps are needed. The goal is:

only one block is shown here only use r0 for result

MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine

Final InvShiftRows(): Step 1, 2

Column 0 Column 1

r0 r1 r2 r3 r0 r1 r2 r3

1 – – – 9 – – –

2 6 – – 10 14 – –

3 – – – 11 – – –

4 8 – – 12 16 – –

5 – – – 13 – – –

6 2 – – 14 10 – –

7 – – – 15 – – –

8 4 – – 16 12 – –

Row 0

Row 1

Row 2

Row 3

Row 4

Row 5

Row 6

Row 7

Express Lane, Row mode

ikr means rk in Row i

31

70

71

30

11

50

51

10 , | , rrrrrrrr

MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine

Final InvShiftRows(): Step 3, 4

Mux A = Left (r), Column modeikr means rk in Column i

13

01

03

11

12

00

02

10 , | , rrrrrrrr

Column 0 Column 1

r0 r1 r2 r3 r0 r1 r2 r3

1 – – – 9 – – –

2 6 – – 10 14 – –

3 – – – 11 – – –

4 8 – – 12 16 – –

5 – – – 13 – – –

6 2 – – 14 10 – –

7 – – – 15 – – –

8 4 – – 16 12 – –

Row 0

Row 1

Row 2

Row 3

Row 4

Row 5

Row 6

Row 7

Column 0 Column 1

r0 r1 r2 r3 r0 r1 r2 r3

1 – – – 9 – – –

2 6 – 14 10 14 – 6

3 – 11 – 11 – 3 –

4 8 – – 12 16 – –

5 – – – 13 – – –

6 2 – – 14 10 – –

7 – 15 – 15 – 7 –

8 4 – 12 16 12 – 4

MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine

Final InvShiftRows(): Step 5

Column 0 Column 1

r0 r1 r2 r3 r0 r1 r2 r3

1 – – – 9 – – –

2 6 – 14 10 14 – 6

3 – 11 – 11 – 3 –

4 8 – – 12 16 – –

5 – – – 13 – – –

6 2 – – 14 10 – –

7 – 15 – 15 – 7 –

8 4 – 12 16 12 – 4

r0 r0

r0 r3

r0 r2

r0 r1

r0 r0

r0 r1

r0 r2

r0 r3

Row 0

Row 1

Row 2

Row 3

Row 4

Row 5

Row 6

Row 7

Column 0 Column 1

r0 r1 r2 r3 r0 r1 r2 r3

1 – – – 9 – – –

14 – – – 6 – – –

11 – – – 3 – – –

8 – – – 16 – – –

5 – – – 13 – – –

2 – – – 10 – – –

15 – – – 7 – – –

12 – – – 4 – – –

Row mode

MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine

Issue 3: (Inv)MixColumns()

• xtime approach is used. Since all the involved data are saved in local registers, the implementation is trivial.

• Because InvMixColumns() needs more xtime and XOR operations, it is slower than MixColumns().

• It seems decryption needs much more number of contexts than encryption because of the more complex InvMixColumns(). However, with careful arrangement of registers and table lookup operation, decryption only needs one more context.

Part III: Performance Evaluation

MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine

Key Expansion Comparison

AES CD (ANSI C) Brian Gladman (VC++) MorphoSys TinyRISC Key Size

Cipher Inverse Cipher Cipher Inverse Cipher Cipher Inverse Cipher

128 2100 2900 305 1389 2770 13320

192 2600 3600 277 1595 3386 15603

256 2800 3800 374 1960 4196 19184

The statistics for ANSI C and C++ is obtained from the AES proposal by Rijndael’s authors

Notice:• The cycle time is not clear here.• The Key Expansion time is not important. For MorphoSys M2, the designed clock frequency is 200MHz. So even the largest 19184 cycles is just 96 us.

MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine

Initialization Time: MorphoSys

Key Size Key Expansion

Table Loading Context Loading

Total # of cycles

128 2770/13320 6249 230/238 9249/19807

192 3386/16029 6249 230/238 9865/22516

256 4196/19184 6249 230/238 10675/25671

Notice:• Initialization includes three parts: Key Expansion, Table Loading and Context Loading. They are only done once in a session.• The Initialization time is very short. When f = 200MHz, the longest time is 128 us (25671 cycles).

x/y: # of cycles for encryption/decryption

MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine

Data Processing: MorphoSys

Encryption Decryption Key Size

# of cycles Xput # of cycles Xput

128 150.25 170.4 Mb/s 166 154.2 Mb/s

192 175.25 146.1 Mb/s 194.5 131.6 Mb/s

256 200.25 127.8 Mb/s 223 114.8 Mb/s

Notice:• These statistics are for the data processing part.• Because MorphoSys is able to process four blocks at the same time, the actual number of cycles for one block is only 1/4 of the computing cycles.• The throughput is calculated at f = 200MHz.

MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine

Data Processing:Other Implementations

Intel 8051 Motorola 68HC08

AES CD (ANSI C) Brain Gladman (VC++)

Java Key Size

# of cycles # of cycles # of cycles Xput # of cycles Xput # of cycles Xput

128 4065 8390 950 27.0 363 70.5 23000 1.1

192 4512 10780 1125 22.8 432 59.3 27600 0.93

256 5221 12490 1295 19.8 500 51.2 32300 0.79

Notice:• The throughput is calculated at f = 200MHz. • Much slower than MorphoSys implementation.

MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine

Comparison: Data Processing

# of Cycles in Different Implementations

0

5000

10000

15000

20000

25000

Inte

l 805

1

Mot

orola 6

8HC08

ANSI C C++Ja

va

Mor

phoSys

MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine

Another Fast Implementation

• On Aug 8, 2001, Amphion Semiconductor Ltd. announced its application-specific cores for AES. They are faster than MorphoSys implementation.

• But we should also consider the following issues– encryption and decryption need different Amphion cores– initialization time in Amphion cores is unknown (but this is

not important provided that its time is not very long)– MorphoSys is not just an ASIC or FPGA, and is capable of

doing many other applications efficiently with the same architecture.

MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine

Amphion ASIC Cores

Encryption Decryption Key Size

Logic Gates Timing Constraints

(MHz) Throughput

(Mb/s) Timing Constraints

(MHz) Throughput

(Mb/s)

128 18.2K 200 581 200 581

192 18.2K 200 492 200 492

256 18.2K 200 426 200 426

Notice:• about 240% to 270% faster than the MorphoSys implementation

TSMC 0.18um Technology

MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine

Amphion FPGA Cores (1)

Encryption Decryption Key Size

Logic Used (LE)*

Memory Used (ESB) Clock Speed

(MHz) Throughput

(Mb/s) Clock Speed

(MHz) Throughput

(Mb/s)

128 1452/1560 8 77.8 226 74.1 215

192 1452/1560 8 77.8 191 74.1 182

256 1452/1560 8 77.8 166 74.1 158

Notice:• about 30% faster than the MorphoSys implementation

Altera APEX20KE-1

MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine

Amphion FPGA Cores (2)

Encryption Decryption Key Size

Logic Used

(LUT)*

Memory Used

(BRAM) Clock Speed (MHz)

Throughput (Mb/s)

Clock Speed (MHz)

Throughput (Mb/s)

128 1008/1092 4 92.3 268 86.7 254

192 1008/1092 8 92.3 227 86.7 213

256 1008/1092 8 92.3 196 86.7 184

Notice:• about 60% faster than the MorphoSys implementation

Xilinx VirtexE-8