Mapping the AES Algorithmto MorphoSys Architecture
Ye Tang
Aug 2001
MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine
Overview
• Part I: AES Algorithm Introduction• Part II: Mapping to MorphoSys• Part III: Performance Evaluation
Part I: AES Introduction
MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine
What Is AES?
• Advanced Encryption Standard• Next generation cryptographic algorithm for use by
U.S. Government organizations to protect sensitive (unclassified) information.
• Will hopefully replace the current standard, DES (Data Encryption Standard) and Triple DES, sooner or later.
MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine
AES Development Timeline
• National Institute of Standards and Technology (NIST) worked with the whole industry and the cryptographic community to develop AES– Jan 1997: NIST announced the initiation of the
AES development – Aug 1998: NIST announced that fifteen algorithms
were selected as candidates– Apr 1999: NIST selected five algorithms from the
fifteen as the AES finalist– Oct 2000: NIST announced that Rijndael has been
selected for the AES.– Feb 2001: Draft FIPS for the AES published for
public comments.– May 2001: Comment period closes.– ? 2001: AES FIPS becomes official.
MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine
Rijndael Overview
• A symmetric block cipher developed by two Belgium cryptology experts, Joan Daemen and Vincent Rijmen
• Apply to data blocks of 128 bits, using cipher keys with lengths of 128, 192, and 256 bits
• Very good performance in both hardware and software across a wide range of computing
• Very high security level. Even with future advances in technology, it has the potential to remain secure well beyond twenty years.
MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine
Terms Used in Rijndael
• Round– Rijndael is an iterated block cipher. Every iteration is called
Round. Rijndael has 10, 12, or 14 Rounds when the Cipher Key size is 128, 192, or 256 bits, respectively.
• Cipher Key– Original secret key used for encryption or decryption. Also
shortened as “Key”. The size can be 128, 192, or 256 bits.
• Round Key– A series of keys derived from the Cipher Key. Every Round
needs a Round Key, Round Key’s size can only be 128 bits.
• Key Expansion– the routine used to generate all Round Keys from the Cipher
Key.
MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine
Basic Structure of Encryption
• Initialization– Key Expansion: Get all necessary Round Keys from the Cipher
Key.
• Data Processing1. Initial step
1. RoundKeyAddition()
2. Intermediate Rounds (10, 12, or 14 Rounds)1. SubBytes()2. ShiftRows()3. MixColumns()4. RoundKeyAddition()
2. Final Round1. SubBytes()2. ShiftRows()3. RoundKeyAddition()
A BSMA BSMA … BSA
MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine
Basic Structure of Decryption
• Initialization– Key Expansion: Get all necessary Round Keys from the Cipher
Key. The detailed procedure is different than Encryption’s.
• Data Processing (Replace with inverse functions)1. Initial step
1. InvRoundKeyAddition()
2. Intermediate Rounds (10, 12, or 14 Rounds)1. InvSubBytes()2. InvShiftRows()3. InvMixColumns()4. InvRoundKeyAddition()
3. Final Round1. InvSubBytes()2. InvShiftRows()3. InvRoundKeyAddition()
same sequence:
A BSMA BSMA … BSA
MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine
Math Background (1)
• Galois Field GF(28)– A byte b, consisting of bits b7 b6 b5 b4 b3 b2 b1 b0 is
considered as a polynomial with coefficient in {0,1}:
b7x7 + b6x6 + b5x5 + b4x4 + b3x3 + b2x2 + b1x1 + b0
• Addition– Coefficients are given by the sum of the coefficients of the
two terms modulo 2
Example: ’57’+’83’=‘D4’ (hexadecimal)
(x6 + x4 + x2 + x + 1) + (x7 + x +1) = x7 + x6 + x4 + x2 – Addition corresponds with bitwise XOR: ’57’ ’83’=‘D4’
• Subtraction (i.e., inverse of addition)– the same as addition (also bitwise XOR)
MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine
Math Background (2)
• Multiplication– Multiplication in GF(28) corresponds with multiplication of
polynomials modulo an irreducible binary polynomial of degree 8. For Rijndael, this irreducible polynomial is called m(x) and given by
m(x) = x8 + x4 + x3 + x + 1
Example: ’57’ ’83’=‘C1’
(x6 + x4 + x2 + x + 1)(x7 + x +1) = x13 + x11 + x9 + x8 + x7 + x7 + x5 + x3 + x2 + x + x6 + x4 + x2 + x +1 = x13 + x11 + x9 + x8 + x6 + x5 + x4 + x3 + 1
x13 + x11 + x9 + x8 + x6 + x5 + x4 + x3 + 1 modulo x8 + x4 + x3 + x + 1 = x7 + x6 + 1
result
MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine
Math Background (3)
• Multiplication by x (xtime operation)– result before modulo m(x) is:
b7x8 + b6x7 + b5x6 + b4x5 + b3x4 + b2x3 + b1x2 + b0x
• If b7 = 0, no reduction;
• If b7 = 1, m(x) must be subtracted (i.e., XORed)
– In other words, b(x) * x can be implemented as a one-bit left shift and a subsequent conditional XOR with ‘1B’.
– Multiplication by x is denoted by a = xtime(b)– Example:
’57’ * ’02’ = xtime(’57’) = (0)10101110 = ‘AE’’57’ * ’04’ = xtime(xtime(‘57’)) = xtime(‘AE’) = (1)01011100 ^ ‘1B’ = ’47’
MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine
Math Background (4)
• How to do multiplication in Rijndael? (e.g. ’57’ * ’13’)– 1st Approach: use table-lookup (two tables: Log & Alog)
mul(’57’,’13’) = Alogtable[(Logtable[’57’]+Logtable[’13’])%255]= Alogtable[(98+14)%255] = Alogtable[112] = 254 = ‘FE’
• logarithmic table and anti-logarithmic table are used
– 2nd Approach: use xtime operation’57’ * ’13’ = ’57’ * (’01’ ^ ’02’ ^ ’10’)
= ’57’ * ’01’ ^ ’57’ * ’02’ ^ ’57’ * ’10’ = ’57’ ^ ‘AE’ ^ ’07’ = ‘FE’
Notice:xtime can be implemented directly by hardware. But in MorphoSys, it is still implemented by table lookup. The difference with the 1st approach is it needs only one table.
MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine
Math Background (5)
• Polynomial multiplication
a(x) = a3x3 + a2x2 + a1x1 + a0 b(x) = b3x3 + b2x2 + b1x1 + b0
c(x) = a(x) * b(x) = (c6x6 + c5x5 + c4x4 + c3x3 + c2x2 + c1x1 + c0)
d(x) = c(x) mod (x4 + 1) Final Result
After some simplification, d(x) can be represented as:
)()()( xbxaxd
3
2
1
0
0123
3012
2301
1230
3
2
1
0
b
b
b
b
aaaa
aaaa
aaaa
aaaa
d
d
d
d
MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine
Basic Function 1a: SubBytes()
• ‘b’ = SubBytes(‘a’)– Substitute ‘a’ with ‘b’ which is the element at address ‘a’ in
table S-box.– table S-box is a predefined 256-byte constant table.– Example: SubBytes(’00’) = ’63’ (’63’ is the first element in S-
box)
MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine
Basic Function 1b: InvSubBytes()
• Similar to SubBytes()• Only difference: the table is “Inv S-box”, another
predefined 256-byte constant table.
MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine
Basic Function 2a: ShiftRows()
• Each row is shifted over different offsets: Row 0, 1, 2, 3 will be shifted over 0, 1, 2, 3 byte(s), respectively.
The number represents the position of the corresponding byte
128416
731511
214106
13951
161284
151173
141062
13951
MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine
Basic Function 2b: InvShiftRows()
• Similar to ShiftRows()• Row 0, 1, 2, 3 will be shifted over 0, 3, 2, 1 byte(s),
respectively.
161284
151173
141062
13951
128416
731511
214106
13951
Notice the positions are restored to original ones if InvShiftRows() is applied after ShiftRows()
MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine
Basic Function 3a: MixColumns()
• The following polynomial multiplication is performed for every column:
where c(x) = ’03’x3 + ’01’x2 + ’01’x + ’02’
i.e.,
3
2
0
3
2
0
11
02010103
03020101
01030201
01010302
a
a
a
a
d
d
d
d
)()()( xaxcxd
)'02(')'03('
)'03(')'02('
)'03(')'02('
)'03(')'02('
32103
32102
32101
32100
aaaad
aaaad
aaaad
aaaad
MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine
Basic Function 3b: InvMixColumns()
• Similar to MixColumns()• Uses c-1(x) instead of c(x)
c-1(x) = ’0B’x3 + ’0D’x2 + ’09’x + ’0E’
MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine
xtime Approach for MixColumns()…
d0 = ’02’*a0 + ’03’*a1 + ’01’ *a2 + ’01’ *a3 = a0 + (a0 + a1 + a2 + a3) + ’02’*(a0 + a1)
3
2
0
3
2
0
11
02010103
03020101
01030201
01010302
a
a
a
a
d
d
d
d
tmp = a0 ^ a1 ^ a2 ^ a3;
tm = a0 ^ a1; tm = xtime(tm); a0 ^ = tm ^ tmp;
tm = a1 ^ a2; tm = xtime(tm); a1 ^ = tm ^ tmp;
tm = a2 ^ a3; tm = xtime(tm); a2 ^ = tm ^ tmp;
tm = a3 ^ a0; tm = xtime(tm); a3 ^ = tm ^ tmp;
tmp = a0 ^ a1 ^ a2 ^ a3;
tm = a0 ^ a1; tm = xtime(tm); a0 ^ = tm ^ tmp;
tm = a1 ^ a2; tm = xtime(tm); a1 ^ = tm ^ tmp;
tm = a2 ^ a3; tm = xtime(tm); a2 ^ = tm ^ tmp;
tm = a3 ^ a0; tm = xtime(tm); a3 ^ = tm ^ tmp;
1st element
notice a0 + a0 = 0 (XOR)
MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine
… and InvMixColumns()
d(x) = c-1(x) * a(x) where c-1(x) = ’0B’x3 + ’0D’x2 + ’09’x + ’0E’ d0 = ’0E’*a0 + ’0B’*a1 + ’0D’*a2 + ’09’*a3 = a0 + (a0 + a1 + a2 + a3) + ’02’*(a0 + a1) + ’04’*(a0 + a2) + ’08’*(a0 + a1 + a2 + a3)
tmp1 = a0 ^ a1 ^ a2 ^ a3; tmp2=xtime(xtime(xtime(tmp1)));
tm1 = a0 ^ a2; tm2 = a0 ^ a1; tm1 = xtime(xtime(tm1)); tm2 = xtime(tm2);
a0 ^ = tm1 ^ tm2 ^ tmp1 ^ tmp2; tm1 = a1 ^ a3; tm2 = a1 ^ a2; tm1 = xtime(xtime(tm1)); tm2 = xtime(tm2);
a1 ^ = tm1 ^ tm2 ^ tmp1 ^ tmp2; tm1 = a2 ^ a0; tm2 = a2 ^ a3; tm1 = xtime(xtime(tm1)); tm2 = xtime(tm2);
a2 ^ = tm1 ^ tm2 ^ tmp1 ^ tmp2; tm1 = a3 ^ a1; tm2 = a3 ^ a0; tm1 = xtime(xtime(tm1)); tm2 = xtime(tm2);
a3 ^ = tm1 ^ tm2 ^ tmp1 ^ tmp2;
tmp1 = a0 ^ a1 ^ a2 ^ a3; tmp2=xtime(xtime(xtime(tmp1)));
tm1 = a0 ^ a2; tm2 = a0 ^ a1; tm1 = xtime(xtime(tm1)); tm2 = xtime(tm2);
a0 ^ = tm1 ^ tm2 ^ tmp1 ^ tmp2; tm1 = a1 ^ a3; tm2 = a1 ^ a2; tm1 = xtime(xtime(tm1)); tm2 = xtime(tm2);
a1 ^ = tm1 ^ tm2 ^ tmp1 ^ tmp2; tm1 = a2 ^ a0; tm2 = a2 ^ a3; tm1 = xtime(xtime(tm1)); tm2 = xtime(tm2);
a2 ^ = tm1 ^ tm2 ^ tmp1 ^ tmp2; tm1 = a3 ^ a1; tm2 = a3 ^ a0; tm1 = xtime(xtime(tm1)); tm2 = xtime(tm2);
a3 ^ = tm1 ^ tm2 ^ tmp1 ^ tmp2;
more xtime operations
MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine
Basic Function 4a: AddRoundKey()
• A 16-byte Round Key is added (i.e., bitwise XORed) to the 16-byte data block. Each data byte is added with the same position Round Key byte.
MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine
Basic Function 4b: InvAddRoundKey()
• Exactly the same as AddRoundKey()– This is because subtraction is the same as addition
MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine
Key Expansion for Encryption
• Key Expansion is the process of generating Round Keys from the Cipher Key.Length of Round Keys = Block Length * (Number of Rounds +1)
e.g., If key size is 128 bits, 10 Rounds are needed, thus 128 * 11 = 1408 bits are needed for Round keys.
• Need table lookup operation– tables needed:
• S-box: 256 bytes
• Rcon: 30 bytes
• Pseudo Code: next slide
MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine
Pseudo Code for Key Expansion
KeyExpansion (Key[4*Nk], W[4*(Nr+1)], Nk){ for ( i = 0; i < Nk; i++)
W[i] = (Key[4*i], Key[4*i+1], Key[4*i+2], Key[4*i+3]); for ( i = Nk; i < 4*(Nr+1); i++) {
temp = W[i-1];if ( i % Nk == 0)
temp = SubWord(RotWord(temp)) ^ Rcon[i/Nk];else if ( Nk = 8 and i % Nk == 4)
temp = SubWord(temp);W[i] = W[i-Nk] ^ temp;
}} SubWord (W(a, b, c, d)){ return W(S-box(a), S-box(b), S-box(c), S-box(d)); } RotWord (W(a, b, c, d))
{return W(b, c, d, a); }
MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine
Key Expansion for Decryption
• First, apply the same expansion procedure as in Encryption.
• Second, apply InvMixColumns() to every Round Key except the first and last one.– There are (# of Rounds+1) Round Keys in total. So (# of
Rounds-1) Round Keys will be applied to InvMixColumns().
• Tables needed:– S-box: 256 bytes– Rcon: 30 bytes– Log: 256 bytes – Alog: 256 bytes
for InvMixColumns(), do not use xtime approach here because Key Expansion is done by TinyRISC and there is enough memory to save tables
Part II: Mapping to MorphoSys
MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine
Important upgrades in M2
• M2 is the next generation MorphoSys architecture• Some new features important to AES implementation
– Every RC can do table lookup operation locally. This is realized by an embedded 512-byte memory in each RC.
– The number of registers in each RC is increased from 4 to 8.
MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine
Key Expansion Implementation
• Completely done by the general-purpose RISC processor in MorphoSys: TinyRISC
• The resulted Round Keys are saved in main (external) memory for future use
MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine
Issues in Data Processing Part
• RC Array Partition• How to Do (Inv)ShiftRows()• About (Inv)MixColumns()
MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine
Issue 1: RC Array Partition
• Compute in parallel– 4 data blocks, or 64 bytes, are processed at the same time
by 64 RCs– RC Array partition: choose the scenario on the right.
• though not intuitive, it provides natural data loading/storing order.
• Thanks to the efficient interconnection network in MorphoSys, the subsequent data move is still kept simple.
0
(4x4)
1
(4x4)
2
(4x4)
3
(4x4)
0 1 2 3 good!
8x88x8
MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine
Issue 2: How to Do (Inv)ShiftRows()?
• In intermediate Rounds, (Inv)ShiftRows() is only a position adjustment for the subsequent (Inv)MixColumns().– It is desirable to have every RC save the data needed for
(Inv)MixColumns, i.e., the data in the same column, during (Inv)ShiftRows(). This will make (Inv)MixColumns() faster.
• But in the final round, no (Inv)MixColumns() follows ShiftRows(). So a simpler data move strategy is used there.
MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine
Data Move for ShiftRows()in intermediate Rounds
• Eight steps are needed. The goal is shown below:
Column 0 Column 1
r0 r1 r2 r3 r0 r1 r2 r3
1 – – – 9 – – –
2 – – – 10 – – –
3 – – – 11 – – –
4 – – – 12 – – –
5 – – – 13 – – –
6 – – – 14 – – –
7 – – – 15 – – –
8 – – – 16 – – –
Row 0
Row 1
Row 2
Row 3
Row 4
Row 5
Row 6
Row 7
Column 0 Column 1
r0 r1 r2 r3 r0 r1 r2 r3
1 6 11 16 9 14 3 8
1 6 11 16 9 14 3 8
1 6 11 16 9 14 3 8
1 6 11 16 9 14 3 8
5 10 15 4 13 2 7 12
5 10 15 4 13 2 7 12
5 10 15 4 13 2 7 12
5 10 15 4 13 2 7 12
only one block is shown here order is not important
MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine
Intermediate ShiftRows(): Step 1
Column 0 Column 1
r0 r1 r2 r3 r0 r1 r2 r3
1 – – – 9 – – –
2 – – – 10 – – –
3 – – – 11 – – –
4 5 – – 12 13 – –
5 – – – 13 – – –
6 1 – – 14 9 – –
7 – – – 15 – – –
8 – – – 16 – – –
Row 0
Row 1
Row 2
Row 3
Row 4
Row 5
Row 6
Row 7
31
40
51
00 , rrrr Express Lane, Row mode
ikr means rk in Row i
MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine
Intermediate ShiftRows(): Step 2
Column 0 Column 1
r0 r1 r2 r3 r0 r1 r2 r3
1 – – – 9 – – –
2 7 – – 10 15 – –
3 – – – 11 – – –
4 5 – – 12 13 – –
5 – – – 13 – – –
6 1 – – 14 9 – –
7 – – – 15 – – –
8 3 – – 16 11 – –
Row 0
Row 1
Row 2
Row 3
Row 4
Row 5
Row 6
Row 7
11
60
71
20 , rrrr Express Lane, Row mode
ikr means rk in Row i
MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine
Intermediate ShiftRows(): Step 3, 4
Column 0 Column 1
r0 r1 r2 r3 r0 r1 r2 r3
1 – – – 9 – – –
2 7 10 15 10 15 2 7
3 – – – 11 – – –
4 5 – – 12 13 – –
5 – – – 13 – – –
6 1 – – 14 9 – –
7 – – – 15 – – –
8 3 16 11 16 11 8 3
Row 0
Row 1
Row 2
Row 3
Row 4
Row 5
Row 6
Row 7
13
01
03
11
12
00
02
10 , | , rrrrrrrr Mux A = Left (r), Column mode
ikr means rk in Column i
all seeds are ready
MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine
Intermediate ShiftRows(): Step 5
Column 0 Column 1
r0 r1 r2 r3 r0 r1 r2 r3
1 – – – 9 – – –
2 7 10 15 10 15 2 7
3 – – – 11 – – –
4 5 – – 12 13 – –
5 – – – 13 – – –
6 1 – – 14 9 – –
7 – – – 15 – – –
8 3 16 11 16 11 8 3
Row 0
Row 1
Row 2
Row 3
Row 4
Row 5
Row 6
Row 7
ikr
rrrr 3,2,1,00
50
7,6,5,40
30 ,
Express Lane, Row modeikr means rk in Row i
Column 0 Column 1
r0 r1 r2 r3 r0 r1 r2 r3
6 – – – 14 – – –
6 7 10 15 14 15 2 7
6 – – – 14 – – –
6 5 – – 14 13 – –
4 – – – 12 – – –
4 1 – – 12 9 – –
4 – – – 12 – – –
4 3 16 11 12 11 8 3
seeds
MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine
Intermediate ShiftRows(): Step 6
Column 0 Column 1
r0 r1 r2 r3 r0 r1 r2 r3
6 – – – 14 – – –
6 7 10 15 14 15 2 7
6 – – – 14 – – –
6 5 – – 14 13 – –
4 – – – 12 – – –
4 1 – – 12 9 – –
4 – – – 12 – – –
4 3 16 11 12 11 8 3
Row 0
Row 1
Row 2
Row 3
Row 4
Row 5
Row 6
Row 7
3,2,1,01
51
7,6,5,41
31 , rrrr Express Lane, Row mode
ikr means rk in Row i
Column 0 Column 1
r0 r1 r2 r3 r0 r1 r2 r3
6 1 – – 14 9 – –
6 1 10 15 14 9 2 7
6 1 – – 14 9 – –
6 1 – – 14 9 – –
4 5 – – 12 13 – –
4 5 – – 12 13 – –
4 5 – – 12 13 – –
4 5 16 11 12 13 8 3
MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine
Intermediate ShiftRows(): Step 7
Column 0 Column 1
r0 r1 r2 r3 r0 r1 r2 r3
6 1 – – 14 9 – –
6 1 10 15 14 9 2 7
6 1 – – 14 9 – –
6 1 – – 14 9 – –
4 5 – – 12 13 – –
4 5 – – 12 13 – –
4 5 – – 12 13 – –
4 5 16 11 12 13 8 3
Row 0
Row 1
Row 2
Row 3
Row 4
Row 5
Row 6
Row 7
3,2,1,02
72
7,6,5,42
12 , rrrr Express Lane, Row mode
ikr means rk in Row i
Column 0 Column 1
r0 r1 r2 r3 r0 r1 r2 r3
6 1 16 – 14 9 8 –
6 1 16 15 14 9 8 7
6 1 16 – 14 9 8 –
6 1 16 – 14 9 8 –
4 5 10 – 12 13 2 –
4 5 10 – 12 13 2 –
4 5 10 – 12 13 2 –
4 5 10 11 12 13 2 3
MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine
Intermediate ShiftRows(): Step 8
Column 0 Column 1
r0 r1 r2 r3 r0 r1 r2 r3
6 1 16 – 14 9 8 –
6 1 16 15 14 9 8 7
6 1 16 – 14 9 8 –
6 1 16 – 14 9 8 –
4 5 10 – 12 13 2 –
4 5 10 – 12 13 2 –
4 5 10 – 12 13 2 –
4 5 10 11 12 13 2 3
Row 0
Row 1
Row 2
Row 3
Row 4
Row 5
Row 6
Row 7
3,2,1,03
73
7,6,5,43
13 , rrrr Express Lane, Row mode
ikr means rk in Row i
Column 0 Column 1
r0 r1 r2 r3 r0 r1 r2 r3
6 1 16 11 14 9 8 3
6 1 16 11 14 9 8 3
6 1 16 11 14 9 8 3
6 1 16 11 14 9 8 3
4 5 10 15 12 13 2 7
4 5 10 15 12 13 2 7
4 5 10 15 12 13 2 7
4 5 10 15 12 13 2 7
MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine
Data Move for InvShiftRows()in intermediate Rounds
Column 0 Column 1
r0 r1 r2 r3 r0 r1 r2 r3
1 – – – 9 – – –
2 – – – 10 – – –
3 – – – 11 – – –
4 – – – 12 – – –
5 – – – 13 – – –
6 – – – 14 – – –
7 – – – 15 – – –
8 – – – 16 – – –
Row 0
Row 1
Row 2
Row 3
Row 4
Row 5
Row 6
Row 7
Column 0 Column 1
r0 r1 r2 r3 r0 r1 r2 r3
1 14 11 8 9 6 3 16
1 14 11 8 9 6 3 16
1 14 11 8 9 6 3 16
1 14 11 8 9 6 3 16
5 2 15 12 13 10 7 4
5 2 15 12 13 10 7 4
5 2 15 12 13 10 7 4
5 2 15 12 13 10 7 4
• Similarly, eight steps are needed. The goal is:
only one block is shown here order is not important
MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine
Intermediate InvShiftRows(): Step 1
Column 0 Column 1
r0 r1 r2 r3 r0 r1 r2 r3
1 – – – 9 – – –
2 5 – – 10 13 – –
3 – – – 11 – – –
4 – – – 12 – – –
5 – – – 13 – – –
6 – – – 14 – – –
7 – – – 15 – – –
8 1 – – 16 9 – –
Row 0
Row 1
Row 2
Row 3
Row 4
Row 5
Row 6
Row 7
11
40
71
00 , rrrr Express Lane, Row mode
ikr means rk in Row i
MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine
Intermediate InvShiftRows(): Step 2
Column 0 Column 1
r0 r1 r2 r3 r0 r1 r2 r3
1 – – – 9 – – –
2 5 – – 10 13 – –
3 – – – 11 – – –
4 7 – – 12 15 – –
5 – – – 13 – – –
6 3 – – 14 11 – –
7 – – – 15 – – –
8 1 – – 16 9 – –
Row 0
Row 1
Row 2
Row 3
Row 4
Row 5
Row 6
Row 7
51
20
31
60 , rrrr Express Lane, Row mode
ikr means rk in Row i
MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine
Intermediate InvShiftRows(): Step 3, 4
Column 0 Column 1
r0 r1 r2 r3 r0 r1 r2 r3
1 – – – 9 – – –
2 5 – – 10 13 – –
3 – – – 11 – – –
4 7 12 15 12 15 4 7
5 – – – 13 – – –
6 3 14 11 14 11 6 3
7 – – – 15 – – –
8 1 – – 16 9 – –
Row 0
Row 1
Row 2
Row 3
Row 4
Row 5
Row 6
Row 7
13
01
03
11
12
00
02
10 , | , rrrrrrrr Mux A = Left (r), Column mode
ikr means rk in Column i
all seeds are ready
MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine
Intermediate InvShiftRows(): Step 5
Column 0 Column 1
r0 r1 r2 r3 r0 r1 r2 r3
1 – – – 9 – – –
2 5 – – 10 13 – –
3 – – – 11 – – –
4 7 12 15 12 15 4 7
5 – – – 13 – – –
6 3 14 11 14 11 6 3
7 – – – 15 – – –
8 1 – – 16 9 – –
Row 0
Row 1
Row 2
Row 3
Row 4
Row 5
Row 6
Row 7
3,2,1,00
70
7,6,5,40
10 , rrrr Express Lane, Row mode
ikr means rk in Row i
Column 0 Column 1
r0 r1 r2 r3 r0 r1 r2 r3
8 – – – 16 – – –
8 5 – – 16 13 – –
8 – – – 16 – – –
8 7 12 15 16 15 4 7
2 – – – 10 – – –
2 3 14 11 10 11 6 3
2 – – – 10 – – –
2 1 – – 10 9 – –
seeds
MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine
Intermediate InvShiftRows(): Step 6
Column 0 Column 1
r0 r1 r2 r3 r0 r1 r2 r3
8 – – – 16 – – –
8 5 – – 16 13 – –
8 – – – 16 – – –
8 7 12 15 16 15 4 7
2 – – – 10 – – –
2 3 14 11 10 11 6 3
2 – – – 10 – – –
2 1 – – 10 9 – –
Row 0
Row 1
Row 2
Row 3
Row 4
Row 5
Row 6
Row 7
3,2,1,01
71
7,6,5,41
11 , rrrr Express Lane, Row mode
ikr means rk in Row i
Column 0 Column 1
r0 r1 r2 r3 r0 r1 r2 r3
8 1 – – 16 9 – –
8 1 – – 16 9 – –
8 1 – – 16 9 – –
8 1 12 15 16 9 4 7
2 5 – – 10 13 – –
2 5 14 11 10 13 6 3
2 5 – – 10 13 – –
2 5 – – 10 13 – –
MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine
Intermediate InvShiftRows(): Step 7
Column 0 Column 1
r0 r1 r2 r3 r0 r1 r2 r3
8 1 – – 16 9 – –
8 1 – – 16 9 – –
8 1 – – 16 9 – –
8 1 12 15 16 9 4 7
2 5 – – 10 13 – –
2 5 14 11 10 13 6 3
2 5 – – 10 13 – –
2 5 – – 10 13 – –
Row 0
Row 1
Row 2
Row 3
Row 4
Row 5
Row 6
Row 7
3,2,1,02
52
7,6,5,42
32 , rrrr Express Lane, Row mode
ikr means rk in Row i
Column 0 Column 1
r0 r1 r2 r3 r0 r1 r2 r3
8 1 14 – 16 9 6 –
8 1 14 – 16 9 6 –
8 1 14 – 16 9 6 –
8 1 14 15 16 9 6 7
2 5 12 – 10 13 4 –
2 5 12 11 10 13 4 3
2 5 12 – 10 13 4 –
2 5 12 – 10 13 4 –
MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine
Intermediate InvShiftRows(): Step 8
Column 0 Column 1
r0 r1 r2 r3 r0 r1 r2 r3
8 1 14 – 16 9 6 –
8 1 14 – 16 9 6 –
8 1 14 – 16 9 6 –
8 1 14 15 16 9 6 7
2 5 12 – 10 13 4 –
2 5 12 11 10 13 4 3
2 5 12 – 10 13 4 –
2 5 12 – 10 13 4 –
Row 0
Row 1
Row 2
Row 3
Row 4
Row 5
Row 6
Row 7
3,2,1,03
53
7,6,5,43
33 , rrrr Express Lane, Row mode
ikr means rk in Row i
Column 0 Column 1
r0 r1 r2 r3 r0 r1 r2 r3
8 1 14 11 16 9 6 3
8 1 14 11 16 9 6 3
8 1 14 11 16 9 6 3
8 1 14 11 16 9 6 3
2 5 12 15 10 13 4 7
2 5 12 15 10 13 4 7
2 5 12 15 10 13 4 7
2 5 12 15 10 13 4 7
MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine
Data Move for ShiftRows()in Final Round
Column 0 Column 1
r0 r1 r2 r3 r0 r1 r2 r3
1 – – – 9 – – –
2 – – – 10 – – –
3 – – – 11 – – –
4 – – – 12 – – –
5 – – – 13 – – –
6 – – – 14 – – –
7 – – – 15 – – –
8 – – – 16 – – –
Row 0
Row 1
Row 2
Row 3
Row 4
Row 5
Row 6
Row 7
Column 0 Column 1
r0 r1 r2 r3 r0 r1 r2 r3
1 – – – 9 – – –
6 – – – 14 – – –
11 – – – 3 – – –
16 – – – 8 – – –
5 – – – 13 – – –
10 – – – 2 – – –
15 – – – 7 – – –
4 – – – 12 – – –
• Five steps are needed. The goal is:
only one block is shown here only use r0 for result
MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine
Final ShiftRows(): Step 1, 2
Column 0 Column 1
r0 r1 r2 r3 r0 r1 r2 r3
1 – – – 9 – – –
2 6 – – 10 14 – –
3 – – – 11 – – –
4 8 – – 12 16 – –
5 – – – 13 – – –
6 2 – – 14 10 – –
7 – – – 15 – – –
8 4 – – 16 12 – –
Row 0
Row 1
Row 2
Row 3
Row 4
Row 5
Row 6
Row 7
Express Lane, Row mode
ikr means rk in Row i
31
70
71
30
11
50
51
10 , | , rrrrrrrr
MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine
Final ShiftRows(): Step 3, 4
Mux A = Left (r), Column modeikr means rk in Column i
13
01
03
11
12
00
02
10 , | , rrrrrrrr
Column 0 Column 1
r0 r1 r2 r3 r0 r1 r2 r3
1 – – – 9 – – –
2 6 – – 10 14 – –
3 – – – 11 – – –
4 8 – – 12 16 – –
5 – – – 13 – – –
6 2 – – 14 10 – –
7 – – – 15 – – –
8 4 – – 16 12 – –
Row 0
Row 1
Row 2
Row 3
Row 4
Row 5
Row 6
Row 7
Column 0 Column 1
r0 r1 r2 r3 r0 r1 r2 r3
1 – – – 9 – – –
2 6 – – 10 14 – –
3 – 11 – 11 – 3 –
4 8 – 16 12 16 – 8
5 – – – 13 – – –
6 2 – 10 14 10 – 2
7 – 15 – 15 – 7 –
8 4 – – 16 12 – –
MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine
Final ShiftRows(): Step 5
Column 0 Column 1
r0 r1 r2 r3 r0 r1 r2 r3
1 – – – 9 – – –
2 6 – – 10 14 – –
3 – 11 – 11 – 3 –
4 8 – 16 12 16 – 8
5 – – – 13 – – –
6 2 – 10 14 10 – 2
7 – 15 – 15 – 7 –
8 4 – – 16 12 – –
r0 r0
r0 r1
r0 r2
r0 r3
r0 r0
r0 r3
r0 r2
r0 r1
Row 0
Row 1
Row 2
Row 3
Row 4
Row 5
Row 6
Row 7
Column 0 Column 1
r0 r1 r2 r3 r0 r1 r2 r3
1 – – – 9 – – –
6 – – – 14 – – –
11 – – – 3 – – –
16 – – – 8 – – –
5 – – – 13 – – –
10 – – – 2 – – –
15 – – – 7 – – –
4 – – – 12 – – –
Row mode
MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine
Data Move for InvShiftRows()in Final Round
Column 0 Column 1
r0 r1 r2 r3 r0 r1 r2 r3
1 – – – 9 – – –
2 – – – 10 – – –
3 – – – 11 – – –
4 – – – 12 – – –
5 – – – 13 – – –
6 – – – 14 – – –
7 – – – 15 – – –
8 – – – 16 – – –
Row 0
Row 1
Row 2
Row 3
Row 4
Row 5
Row 6
Row 7
Column 0 Column 1
r0 r1 r2 r3 r0 r1 r2 r3
1 – – – 9 – – –
14 – – – 6 – – –
11 – – – 3 – – –
8 – – – 16 – – –
5 – – – 13 – – –
2 – – – 10 – – –
15 – – – 7 – – –
12 – – – 4 – – –
• Five steps are needed. The goal is:
only one block is shown here only use r0 for result
MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine
Final InvShiftRows(): Step 1, 2
Column 0 Column 1
r0 r1 r2 r3 r0 r1 r2 r3
1 – – – 9 – – –
2 6 – – 10 14 – –
3 – – – 11 – – –
4 8 – – 12 16 – –
5 – – – 13 – – –
6 2 – – 14 10 – –
7 – – – 15 – – –
8 4 – – 16 12 – –
Row 0
Row 1
Row 2
Row 3
Row 4
Row 5
Row 6
Row 7
Express Lane, Row mode
ikr means rk in Row i
31
70
71
30
11
50
51
10 , | , rrrrrrrr
MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine
Final InvShiftRows(): Step 3, 4
Mux A = Left (r), Column modeikr means rk in Column i
13
01
03
11
12
00
02
10 , | , rrrrrrrr
Column 0 Column 1
r0 r1 r2 r3 r0 r1 r2 r3
1 – – – 9 – – –
2 6 – – 10 14 – –
3 – – – 11 – – –
4 8 – – 12 16 – –
5 – – – 13 – – –
6 2 – – 14 10 – –
7 – – – 15 – – –
8 4 – – 16 12 – –
Row 0
Row 1
Row 2
Row 3
Row 4
Row 5
Row 6
Row 7
Column 0 Column 1
r0 r1 r2 r3 r0 r1 r2 r3
1 – – – 9 – – –
2 6 – 14 10 14 – 6
3 – 11 – 11 – 3 –
4 8 – – 12 16 – –
5 – – – 13 – – –
6 2 – – 14 10 – –
7 – 15 – 15 – 7 –
8 4 – 12 16 12 – 4
MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine
Final InvShiftRows(): Step 5
Column 0 Column 1
r0 r1 r2 r3 r0 r1 r2 r3
1 – – – 9 – – –
2 6 – 14 10 14 – 6
3 – 11 – 11 – 3 –
4 8 – – 12 16 – –
5 – – – 13 – – –
6 2 – – 14 10 – –
7 – 15 – 15 – 7 –
8 4 – 12 16 12 – 4
r0 r0
r0 r3
r0 r2
r0 r1
r0 r0
r0 r1
r0 r2
r0 r3
Row 0
Row 1
Row 2
Row 3
Row 4
Row 5
Row 6
Row 7
Column 0 Column 1
r0 r1 r2 r3 r0 r1 r2 r3
1 – – – 9 – – –
14 – – – 6 – – –
11 – – – 3 – – –
8 – – – 16 – – –
5 – – – 13 – – –
2 – – – 10 – – –
15 – – – 7 – – –
12 – – – 4 – – –
Row mode
MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine
Issue 3: (Inv)MixColumns()
• xtime approach is used. Since all the involved data are saved in local registers, the implementation is trivial.
• Because InvMixColumns() needs more xtime and XOR operations, it is slower than MixColumns().
• It seems decryption needs much more number of contexts than encryption because of the more complex InvMixColumns(). However, with careful arrangement of registers and table lookup operation, decryption only needs one more context.
Part III: Performance Evaluation
MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine
Key Expansion Comparison
AES CD (ANSI C) Brian Gladman (VC++) MorphoSys TinyRISC Key Size
Cipher Inverse Cipher Cipher Inverse Cipher Cipher Inverse Cipher
128 2100 2900 305 1389 2770 13320
192 2600 3600 277 1595 3386 15603
256 2800 3800 374 1960 4196 19184
The statistics for ANSI C and C++ is obtained from the AES proposal by Rijndael’s authors
Notice:• The cycle time is not clear here.• The Key Expansion time is not important. For MorphoSys M2, the designed clock frequency is 200MHz. So even the largest 19184 cycles is just 96 us.
MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine
Initialization Time: MorphoSys
Key Size Key Expansion
Table Loading Context Loading
Total # of cycles
128 2770/13320 6249 230/238 9249/19807
192 3386/16029 6249 230/238 9865/22516
256 4196/19184 6249 230/238 10675/25671
Notice:• Initialization includes three parts: Key Expansion, Table Loading and Context Loading. They are only done once in a session.• The Initialization time is very short. When f = 200MHz, the longest time is 128 us (25671 cycles).
x/y: # of cycles for encryption/decryption
MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine
Data Processing: MorphoSys
Encryption Decryption Key Size
# of cycles Xput # of cycles Xput
128 150.25 170.4 Mb/s 166 154.2 Mb/s
192 175.25 146.1 Mb/s 194.5 131.6 Mb/s
256 200.25 127.8 Mb/s 223 114.8 Mb/s
Notice:• These statistics are for the data processing part.• Because MorphoSys is able to process four blocks at the same time, the actual number of cycles for one block is only 1/4 of the computing cycles.• The throughput is calculated at f = 200MHz.
MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine
Data Processing:Other Implementations
Intel 8051 Motorola 68HC08
AES CD (ANSI C) Brain Gladman (VC++)
Java Key Size
# of cycles # of cycles # of cycles Xput # of cycles Xput # of cycles Xput
128 4065 8390 950 27.0 363 70.5 23000 1.1
192 4512 10780 1125 22.8 432 59.3 27600 0.93
256 5221 12490 1295 19.8 500 51.2 32300 0.79
Notice:• The throughput is calculated at f = 200MHz. • Much slower than MorphoSys implementation.
MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine
Comparison: Data Processing
# of Cycles in Different Implementations
0
5000
10000
15000
20000
25000
Inte
l 805
1
Mot
orola 6
8HC08
ANSI C C++Ja
va
Mor
phoSys
MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine
Another Fast Implementation
• On Aug 8, 2001, Amphion Semiconductor Ltd. announced its application-specific cores for AES. They are faster than MorphoSys implementation.
• But we should also consider the following issues– encryption and decryption need different Amphion cores– initialization time in Amphion cores is unknown (but this is
not important provided that its time is not very long)– MorphoSys is not just an ASIC or FPGA, and is capable of
doing many other applications efficiently with the same architecture.
MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine
Amphion ASIC Cores
Encryption Decryption Key Size
Logic Gates Timing Constraints
(MHz) Throughput
(Mb/s) Timing Constraints
(MHz) Throughput
(Mb/s)
128 18.2K 200 581 200 581
192 18.2K 200 492 200 492
256 18.2K 200 426 200 426
Notice:• about 240% to 270% faster than the MorphoSys implementation
TSMC 0.18um Technology
MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine
Amphion FPGA Cores (1)
Encryption Decryption Key Size
Logic Used (LE)*
Memory Used (ESB) Clock Speed
(MHz) Throughput
(Mb/s) Clock Speed
(MHz) Throughput
(Mb/s)
128 1452/1560 8 77.8 226 74.1 215
192 1452/1560 8 77.8 191 74.1 182
256 1452/1560 8 77.8 166 74.1 158
Notice:• about 30% faster than the MorphoSys implementation
Altera APEX20KE-1
MorphoSys Project, UC IrvineMorphoSys Project, UC Irvine
Amphion FPGA Cores (2)
Encryption Decryption Key Size
Logic Used
(LUT)*
Memory Used
(BRAM) Clock Speed (MHz)
Throughput (Mb/s)
Clock Speed (MHz)
Throughput (Mb/s)
128 1008/1092 4 92.3 268 86.7 254
192 1008/1092 8 92.3 227 86.7 213
256 1008/1092 8 92.3 196 86.7 184
Notice:• about 60% faster than the MorphoSys implementation
Xilinx VirtexE-8
Top Related