buell/Public_Data/reconfigurable_papers/... · Web viewReconfigurable Computer. By. Pradeep...
Transcript of buell/Public_Data/reconfigurable_papers/... · Web viewReconfigurable Computer. By. Pradeep...
The Advanced Encryption Standard on aReconfigurable Computer
By
Pradeep Kancharla
Bachelor of EngineeringOsmania University, 2001
-------------------------------------------------------------------
Submitted in Partial Fulfillment of the
requirements for the Degree of Master of Science
in the Department of Computer Science and Engineering
University of South Carolina
2003
____________________________ ____________________________
Department of Computer Science Department of Computer Scienceand Engineering and EngineeringDirector of Thesis First Reader
____________________________ ____________________________
Department of Computer Science Dean of the Graduate School and Engineering Second Reader
ACKNOWLEDGEMENTS
I would like to express my deepest gratitude to my advisor Dr. Duncan A Buell for his untiring guidance and encouragement which made this thesis possible. I would like to thank my research group, the Reconfig, for their support during the preparation of the thesis. Last, but not the least, I wish to express my deepest appreciation and gratitude to my parents, sister in India for all their love and unfailing support throughout this years.
ii
Table of Contents
1. The Advanced Encryption Standard . . . . . . . . . . . . 01
2. The HC 36m – A Reconfigurable Computer. . . . . . .12
3. VHDL Implementation. . . . . . . . . . . . . . . . . . . . . . . 16
4. Viva Implementation . . . . . . . . . . . . . . . . . . . . . . . . 25
5. Results and Conclusions. . . . . . . . . . . . . . . . . . . . . . 43
6. References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .52
7. Appendix A: C code used for testing. . . . . . . . . . . . 57
8. Appendix B: VHDL Implementation . . . . . . . . . . . 62
iii
List of Figures
1. Example of 128 bit State. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 02
2. Pseudo-C Code for Encryption. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 03
3. Pseudo-C Code for Decryption. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 04
4. Affine transformation in ByteSub. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 05
5. Polynomial multiplication using Matrices. . . . . . . . . . . . . . . . . . . . . . . . . 06
6. Pseudo-C Implementation of Key Schedule. . . . . . . . . . . . . . . . . . . . . . . . 08
7. Quad Structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
8. Architecture of HC 36m. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
9. Corelib. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
10. Snapshot of Viva. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
11. Input from a file to an input horn. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
12. Format of a file input. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
13. Lookup table. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
14. Mux with pathnames given manually. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
15. Mux with pathnames pointing to a pointer. . . . . . . . . . . . . . . . . . . . . . . . . 28
16. Setting a file pointer to a specific path. . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
17. Design of a round in Encryption. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
18. Design of a round in Decryption. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
19. Design of a round in Encryption. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
20. Multiplication by x of ‘02’. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
21. cmmix object. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
22. Design of a round in Encryption. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
23. Design of a round in Decryption. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
iv
List of Tables
1. Number of rounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 03
2. Offsets of rows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 05
3. Par Report for VHDL implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4. Results of rounds and stages in iterative approach . . . . . . . . . . . . . . . . . . . 44
5. Results of rounds and stages in Non -iterative approach . . . . . . . . . . . . . . . 45
6. Architectures implemented on HC 36m. . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
7. Results of architectures in Viva 2.2 and Viva 2.3. . . . . . . . . . . . . . . . . . . . . 47
8. Throughput for different architectures ……………………………………..49
9. Other Implementations on Virtex family chips ……………………………50
v
Chapter 1: The Advanced Encryption Standard
Introduction:
In a world where many transactions are done over networks, attacks on the security of the
data over the network have become a major concern. Cryptography is used as a tool to
counter these attacks. With ever expanding technology and the increase in speeds of
microprocessor chips, DES (the Data Encryption Standard) had, by the late 1990s
become obsolete.
In 1997, the United States National Institute of Standards and Technology (NIST)
initiated a new Encryption standard, called the Advanced Encryption Standard, which
was to replace DES as the Federal Information Processing Standard (FIPS). In October
2002, after an extensive search process, a block cipher algorithm Rijndael was accepted
as the new Advanced Encryption Standard. The algorithm was designed by Vincent
Rijmen and Joan Daemen.
vi
The Rijndael Algorithm:
Rijndael is an iterated block cipher and can have variable block and key lengths. The
block length and the key length can each be any of 128, 192, or 256 bits. The block and
the intermediate cipher can be envisioned as a two-dimensional array of four rows called
the State. The number of columns varies depending on the bit length.
a1 a5 a9 a13
a2 a6 a10 a14
a3 a7 a11 a15
a4 a8 a12 a16
Fig 1: Example of 128 bit State
All the operations in Rijndael are performed either on bytes or on 4-byte words, where
bytes represent elements in the finite field, or Galois field, GF (28). The 4-byte words are
the columns of the State. The key is also viewed in the format above. The input to the
cipher, also known as plaintext, is a one-dimensional array of 16, 24, or 32 bytes
depending on the block size. These bytes are mapped into the States in column order.
For example, in the case of a 128-bit block, the bytes of the plaintext are filled into the
cells in the order a1, a2, a3, a4, a5 … a16. The key is also filled into a two-dimensional
array in the same manner.
Rijndael is an iterative algorithm. A different key derived from initial key is used in each
of the iterations, called a round. The number of rounds depends on the key and block
vii
lengths. The following table gives the number of rounds to be performed based on the
block length (BL) and key length (KL) in terms of bits.
Number of rounds BL = 128 BL = 192 BL = 256KL = 128 10 12 14KL = 192 12 12 14KL = 256 14 14 14
Table 1: Number of Rounds
Each round except the final round consists of four different transformations. They are
ByteSub, ShiftRow, MixColumn and the Round Key Addition. The final
round does not contain MixColumn. The Round Key, which is used in the Round
Key Addition, is derived from the cipher key through a process called Key
Schedule. This can be done initially before the rounds or in parallel with the rounds.
The algorithm for Encryption and Decryption is given in pseudo-C code below. The
number of rounds in the code depends on the bit lengths of key and plaintext.
Key Schedule (Initial Cipher Key, Expanded Round Key);Round Key Addition (State, Round Key);For (I = 0; I < Number of Rounds; I ++)
{ ByteSub (State); ShiftRow (State); if (! Final Round) MixColumn (State); Round Key Addition (State, Round Key); }
Fig 2: Pseudo-C code for Encryption
viii
Key Schedule (Initial Cipher Key, Expanded Round Key) For (I = 0; I < Number of Rounds; I ++) { Round Key Addition (State, Round Key); if (I! = 0) InvMixColumn (State); InvByteSub (State); InvShiftRow (State); } Round Key Addition (State, Round Key);
Fig 3: Pseudo-C code for Decryption
The Key Schedule can be done either before the rounds or in parallel with the rounds.
In the Key Schedule the initial key is expanded to the length of block length
multiplied by one greater than the number of rounds. This will produce a different set of
key for each round which is used in Round Key Addition. As the Decryption is just
an inverse of Encryption, our emphasis will be on Encryption with a further explanation
of the differences for Decryption whenever required.
ByteSub Transformation:
This transformation works independently on each of the cells of the State. The
transformation consists of two parts. First, the multiplicative inverse of the byte is
calculated, followed by an affine transformation. The affine transformation to be applied
is given below:
ix
=
+
Fig 4: Affine transformation in ByteSub [4]
All the operations are done in GF (28). The multiplicative inverse is taken as ‘00’ mapped
onto itself. In the case of Decryption, called InvByteSub, an inverse of the affine
mapping done above is applied followed by taking the multiplicative inverse.
Since the bitwise operations in GF(28) are hard to implement in software, a different
approach is used in the actual implementation.
ShiftRow Transformation:
This transformation is applied independently to all the four rows. Each row is cyclically
shifted left by a different offset. The first row is not shifted at all. The offsets of each
row are determined by the block length. The following table gives the offsets in terms of
columns to be moved for varying block sizes.
Shift offsets Row 2 Row 3 Row 4BL = 128 1 2 3
x
BL = 192 1 2 3BL = 256 1 3 4
Table 2: Offsets of rows based on block lengths
In case of Decryption, called InvShiftRow, the rows are shifted back to nullify the
effect. That is, the rows are cyclically shifted left with offset equal to number of columns
of State minus the offset for Encryption.
MixColumn Transformation:
This transformation is applied independently on each column of the State. Each column
of the State is treated as a polynomial. For example, the first column in Fig1 can be
treated as a1x +a2x +a3x+a4. This polynomial is multiplied by a fixed polynomial
given by e(x)=03x +02x +01x+01, modulo x +1, in GF(28).
This can be done in matrix multiplication as follows:
=
Fig 5: Polynomial multiplication using matrices [4]
In the case of Decryption, called InvMixcolumn, each column is multiplied by the
polynomial d(x)=0Bx +0Dx +09x+0E, so that e(x) d(x) = 1.
xi
Round Key Addition:
In this transformation, the Round Key is added to the State. Addition in GF(28) is a
simple bit wise XOR. The round key is of the same length of the State. It is derived from
the initial cipher by means of Key Schedule.
Key Schedule:
The Key Schedule is the process of deriving the Round Key for each round from the
initial cipher key. This involves expansion of the initial key followed by selection of the
key for each round. The Round Key Addition is done once every round and an
additional Round Key Addition is done, before the rounds in the case of
Encryption, and after in Decryption. Since Round Key should be the same length as
Block, the total number of Round Key bits, called the Expanded Round Key, must
be the block length times one greater than the number of rounds. A pseudo-C
implementation of Key Schedule is explained below. The expanded key can be viewed as
an array of 32-bit words represented as W[nb*(nr+1)], where nb is the number of
columns in the State and nr is the number of rounds.
Key expansion is done differently for different key sizes. Let nk be the number of 32 bit
words in the key. The functions subbyte takes the 32-bit word and does a byte
xii
substitution on each of the bytes and returns a 32-bit word. The rotbyte performs a left
cyclic permutation by bytes on the input. The Col function returns a 32 bit words
packed from the bytes given as input. We can see that the Expanded Key also contains
the initial cipher key in its original form.
The function rcon(i) is Col(Rc[i],‘00’, ‘00’, ‘00’). Rc[i], also called
the round constant, is given by the following formula
Rc[1] = ‘01’
Rc[i] = ‘02’i-1
For (i = 0; i < nk; i ++) W[i] = Col(key[4*i], key[4*i+1], key[4*i+2], key[4*i+3]); For( i = nk; i < nb * ( nr + 1) ; i ++) { temp = W[i – 1]; if (nk <=6) if (i % nk == 0) temp = subbyte(rotbyte(temp)) ^ rcon(i/nk); else { if (i % nk == 0) temp = subbyte(rotbyte(temp)) ^ rcon(i/nk); else if (i % nk == 4) temp = subbyte(temp); } W[i] = W[i – nk] ^ temp;}
Fig 6: Pseudo-C implementation of Key Schedule [4]
For Encryption the necessary round bits are taken from W starting from index i = 0. For
Decryption it is the reverse. The round key taken for the last round will be used in the
first round in Decryption in the same order of bits.
xiii
The Galois Field GF(2 m ) :
A Galois Field GF (q) is a field with q elements, also called a finite field because there is
a finite number (q) of elements. A Primitive Element of GF(q) is an element ‘a‘ such that
every field element except zero can be expressed as a power of a . Each Galois Field has
at least one primitive element. If q = 2m, where m is any integer and 2m-1 is prime, the
elements of the field can be represented by polynomials whose coefficients are elements
of the field GF(2) that is 0 and 1. The primitive element of such a field would itself be
such a polynomial.
Arithmetic in GF (2 8 ):
As we see above, all the arithmetic is done at the byte level. In GF (28), the addition of
bits 1 and 1 is 0. This arithmetic cannot be implemented in software using standard
functions such as multiplication and division for finding the product and other values like
the multiplicative inverse. Manipulation of bits in software is complex and hard to debug.
Fortunately, however, since the Galois Field is represented as 8-bit values, the possible
input values for any unary operation will be one of 256 values. This can be utilized in
dividing the complex operation into a series of unary operations and performing the
unary operations using lookup tables instead of performing the actual arithmetic itself.
xiv
For example, we can do the multiplication by using the logarithm and antilogarithm
functions. Taking the logarithm of the multiplicand and multiplier can be done in one
step using a linear array of 256 values. Then these values can be added bitwise, which is
an XOR operation. The antilog can then be obtained by using another lookup table.
Arithmetic can thus be done much more easily at the cost of extra memory.
C code:
We are using a C implementation of the algorithm to test the results of the VHDL and
Viva implementation. The code is taken from Daemen and Rijnmen [3]. This
implementation is done using look-up tables. The lookup tables are stored as linear
arrays. The lookup tables are used in the ByteSub, Key Schedule and
MixColumn transformations. The independent operations on different elements in each
stage are done iteratively. The State is stored in a two dimensional array.
In the case of ByteSub, a for loop is used to iterate over all rows and columns of the
State. The transformation is done using a linear array of 256 elements from which the
input is used as an index to the array containing the transformed values. Thus the whole
transformation, of finding a multiplicative inverse and applying an affine transformation,
is done using a single lookup.
xv
The ShiftRow transformation is done based on the bit lengths. The shifts of each row
are stored in an array. Based on whether the algorithm is in an Encryption or Decryption
stage, appropriate shifts are fetched from the array and the rows are shifted accordingly.
The MixColumn transformation is implemented by running a for loop on the number
of columns. The multiplications are done using the log and antilog lookup tables. The log
and antilog values of all of the possible 256 inputs are stored in two linear arrays. The
appropriate value is retrieved using the index. Thus the multiplication can be done by
using three lookups (two log values and one antilog value) and an addition. This will
avoid doing the complex bit wise manipulations involved in actual Galois Field
arithmetic. The polynomial to be multiplied is stored as constants in the program.
The Round Key Addition is done using an XOR. The Key is passed as a two-
dimensional array and a for loop is used to iterate on all the cells of the State.
The Key Schedule uses two lookup tables for doing the rotation and substitution. A
three-dimensional array is used to store the Expanded Key. Key selection is done on the
primary index. The Key schedule is done before the rounds, and all the key bits are
stored in an array which is used in Encryption as well as Decryption.
The C code is used just to check the results of our other implementations in VHDL and
Viva®. We implemented the algorithm in different ways to evaluate their resource usage
and timing. The Code is added as an appendix A.
xvi
Chapter 2: HC36m – A Reconfigurable Computer
The platform we are targeting is an HC 36m Hypercomputer® developed by Star Bridge
Systems. The reconfigurable resources on the Hypercomputer comprise five Xilinx
Virtex-II 6000 and two Virtex-II 4000 FPGA chips organized in a proprietary manner.
The processing capability of this architecture is built upon four Processing Elements
(PEs). Each PE is a Xilinx Virtex-II 6000 FPGA chip connected to four DDR RAM
modules each of 512MB with a 90-bit wide communication link. The four PEs are
arranged in a “Quad Structure” passing through a cross-point, which is another Virtex-II
6000 chip with a 50-bit wide communication link to each PE. The Virtex-II 4000 chips
serve as a bus controller and a router. The 2.4GHz Xeon Processors on the host are
connected to the FPGA interface through a 64-bit bidirectional PCIX bus running at
66MHz. If the data to be sent is more than the available bit width, the PCIX bus muxes
the data to be sent.
xvii
Fig 7: Quad Structure [20]
.
Fig 8: Architecture of HC 36m [20]
The HC 36m comes with a development environment called
Viva®. Viva provides a graphical editor for designing
applications, which are then synthesized by Viva and mapped
onto hardware using Xilinx tools. The design need not be
constrained to a single chip, since Viva is capable of mapping
designs onto more than one chip. Viva also comes with a rich
library of objects which can be used in the design of
xviii
applications. A snapshot of the library objects is shown in the figure on the right. The
current version of the library comes in a sheet called corelib.
Fig 9: Corelib
The I2ADL editor provides a graphical interface for creating applications. A design can
be stored as a sheet. The sheets can be made into objects to be reusable in other designs.
Thus a user can create his/her own library of objects and reuse them just by loading the
sheet with its objects and dragging the objects onto the new sheet.
xix
Fig 10: Snapshot of Viva.
There are three more editors in Viva: the Data Set Editor, used to create new data sets;
the Resource Editor used for allocating resources; and the System Editor used to
manipulate constraints such as the EDIF file to be compiled, the system descriptions, the
clock period, and so forth. The object-oriented paradigm allows one to build designs
hierarchically, thus decreasing the complexity.
xx
The most important concept for any programming language, however, is debugging, and
debugging in Viva can be very difficult. The error messages given by Viva, for example,
have not been very useful. This makes programming difficult if something goes wrong.
The “widget interface” is not really sufficient for hardware designs. It would be more
useful if there were a way to see the timing diagram for a design on the hardware.
The current Viva version is Viva 2.3. This version has some enhancements over previous
versions in terms of synthesis time, but many of the existing designs that synthesized and
executed under previous versions are not compatible with this new version. We have had
to make some changes in our designs in order to migrate to the new version.
Chapter 3: VHDL Implementation
xxi
A VHDL implementation of the algorithm for 128-bit key and block size has been done
to compare with Viva the results in terms of silicon usage and delay. In this
implementation, lookup tables were used rather than doing the actual GF(28) arithmetic.
The code is added as Appendix B.
The lookup tables are stored as RAMs. There are a total of four lookup tables used in the
algorithm. Those lookup tables are stored in the files sbox_ram.vhd,
alogtable_ram.vhd, logtable_ram.vhd and rbox_ram.vhd. All lookup
tables take an index as input and output the value corresponding to that index. All the
lookup tables mentioned above except that in the rbox_ram.vhd file store 256 values
needed for a unary operation in GF (28). The rbox_ram.vhd file contains a lookup
table having 30 values required for the rotation operation in Key Schedule. They are
indexed starting from zero.
Since the algorithm works in GF(28), all the variables are defined to be large_int, a
subtype of integer that allows values in the range 0 to 255 only. Other packages and type
definitions are stored in the file packages.vhd.
Key Schedule is done before the rounds start and the Expanded Key is stored in arrays.
All the operations in the transformations of the round are implemented in parallel, in
contrast to the iterated approach used in the C implementation. This uses a great deal of
silicon resource but will have a minimum delay. The code is simulated using ModelSim
xxii
[15] and was synthesized using the Xilinx ISE [26] compiler. The various entities of the
algorithm are explained in order of complexity and hierarchy.
shiftrow:
Since we are dealing with only one bit-size, this transformation can be implemented
simply by routing the inputs to the appropriate outputs, and no silicon will be used for
this transformation. The code for this is in the shiftrow.vhd file. The inputs for this
entity are sixteen values of type large_int and the outputs are merely in a shuffled
order.
roundkey:
This entity is used to do an 8-bit XOR. Since in VHDL we have only a bit wise XOR
function, the functions conv_std_logic_vector and conv_integer, available
in the ieee.std_logic_arith package, are used for conversion between an
integer and a std_logic_vector. This entity has two inputs of type large_int
and outputs a single value of same type. The implementation is in the file
roundkey.vhd.
round_roundkey:
xxiii
The round_roundkey entity takes the State and key in the form of 32 inputs of type
large_int and performs an 8-bit XOR using one of the State and key inputs. For this,
sixteen roundkey entities are used. All the XORs are implemented in parallel. The
output is 16 large_int values, which comprise the State. This entity performs the
Round Key Addition transformation in the round. The implementation is in the
round_roundkey.vhd file.
round_sbox:
This entity does the ByteSub transformation using the lookup tables (RAM_sbox). The
round_sbox entity takes State in the form of sixteen large_int inputs and passes
them through sixteen RAM_sbox entities in parallel. The output is again the State. The
code corresponding to this entity is in round_sbox.vhd file.
addcmp:
This takes two inputs of type large_int and adds them modulo 255. The output is also
a large_int. The implementation is in addcmp.vhd file. This is used primarily in
multiplication, as explained below.
multiply:
xxiv
This entity takes two values to be multiplied as input and produces the product. All the
inputs and outputs are of type large_int. The entities used for this are the two lookup
tables RAM_logtable and RAM_alogtable. One of the inputs is given as an input to
the RAM_logtable entity. The output would be the log value of the input. The other
input is itself the log value, since it is always constant. The log value is given as an input
to avoid another lookup. These two values are given as inputs to the addcmp entity. The
output of addcmp is passed to RAM_alogtable, which provides the product. Now
the inputs are checked for zeroes. If any of the inputs is zero, then the output is returned
as zero, or else the product is passed as the output. The code is in the Multiply.vhd
file.
mix:
This entity takes a column of the State shifted in different offsets. It multiplies these four
values with the constant polynomial used in the algorithm and adds the results. The
output is one cell of the State after the MixColumn transformation. For this entity the
inputs are the four large_ints and the output is a large_int. The other entities
used here are the multiply and roundkey. Since the polynomial used in the
multiplication has two coefficients of 1, multiplication with them is redundant. Thus,
only two multiplications are used to get the other two products. Later these four values
are added using the roundkey entities and the result is passed out. The code is in the
Mix.vhd file.
xxv
mixcolumn:
This entity performs the MixColumn transformation. It takes the State in the form of
sixteen large_ints and outputs the same after the transformation. For this purpose it
uses sixteen mix entities. This takes one column at a time and shifts them appropriately
and passes to them to the mix entities. The outputs of these entities are placed in the
corresponding places of the State. All the operations are done in parallel. The
implementation is in the MixColumn.vhd file.
keyshedule:
This entity takes key values for a round as an input and produces the key values for the
next round. The input is taken in the form of sixteen large_ints and the outputs are
stored in a key array. The different entities used here are roundkey,
RAM_sbox,RAM_rbox. The inputs are routed through these entities such that they
produce the desired output. The implementation is in the keyschedule.vhd file.
round:
xxvi
This constitutes a round of the algorithm. The inputs are the State and the key in the form
of 32 inputs, and the output is the State. Both inputs and outputs are of type
large_int. The State inputs are first routed through the round_sbox entity followed
by shiftrow, mixcolumn and roundkey. The key inputs are directly routed to
roundkey. The output would be the transformed State after applying one round
transformation. The implementation is in the round.vhd file.
lround:
This actually implements the last round of the algorithm, which is slightly different from
the remaining rounds. The only difference between round and lround is that the latter
does not have the mixcolumn entity. The output of the shiftrow is directly routed to
roundkey. The implementation is found in the lround.vhd file.
aes:
This entity connects all the pieces to complete the algorithm. The roundkey, round,
lround, keyschedule entities are used here. First, the key is passed as input to
keyschedule. There will be a series of ten keyschedule entities the output of each
of which is fed to the next. The initial key is fed as input to the first keyschedule. At
the end, the outputs of all the entities of keyschedule hold the Expanded Key for the
entire algorithm. The State is first passed through sixteen roundkey entities. This is the
initial Round Key Addition transformation performed prior to the rounds. Then we
xxvii
have nine round entities and one lround the output of one is passed to the other. The
output of lround is the required encrypted block. The code can be found in the file
aes.vhd.
Decryption:
Decryption is similar to Encryption, with minor differences explained below in terms of
entities for each transformation.
The first difference is the InvByteSub transformation. Instead of the RAM_sbox used
in Encryption, we use in Decryption an entity RAM_dsbox that contains the inverse of
the RAM_sbox values. The entity can be found in the file dsbox_ram.vhd file.
The InvShiftRow is the transformation that is applied to nullify the ShiftRow
transformation applied in Encryption. For this we use the entity dshiftrow which is in
the file dshiftrow.vhd. This is similar to shiftrow in the sense that it just routes
the inputs to the appropriate output to produce the effect of shifting. The shifting is done
such a way that it nullifies the shifting done in shiftrow.
The InvMixColumn differs from the MixColumn in two ways. First, there is a different
polynomial being multiplied times the State. Although the polynomial differs, it is stored
in terms of constants similar to the way done in MixColumn. The entity is
invmixcolumn and is implemented in the file invmixcolumn.vhd. The second
xxviii
difference is the mix in the Encryption. In Encryption we use a polynomial which has
two coefficients as ones. But here the polynomial does not contain coefficients as ones.
So we cannot avoid the multiply objects as in Encryption. The variation is shown in
the dmix entity in the file dmix.vhd.
The Round Key Addition transformation has no difference in the Encryption and
Decryption. We therefore use the same entities used in the Encryption for Decryption
also.
The order of the transformations in the round also changes in the Decryption. First the
input is routed to round_roundkey entity, which is followed by invmixcolumn,
dshiftrow and then by round_dsbox. The entity used is dround, and the
implementation can be seen in the file dround.vhd.
In Decryption, it is the first round, and not the last round, that differs from the other
rounds. The first round does not have invmixcolumn. The input is passed through
round_roundkey and then through dshiftrow and round_dsbox. The entity
representing this is the fround and the implementation can be seen in fround.vhd.
The key generation is similar to that of Encryption, but the keys are used in reverse order
compared to Encryption. The keys that are used for the first round are routed to the last
round in Decryption. Similarly, the key in used in the second round is used in the ninth
xxix
round in Decryption and the key used in the lround in the Encryption goes to fround
in Decryption. The entity used for this is the daes entity and is in the file daes.vhd.
All the designs are simulated using ModelSim and synthesized using Xilinx ISE.
The results of the implementation are given and analyzed in Chapter 5.
xxx
Chapter 4: Viva Implementation
Implementation of the algorithm is started by using lookup tables for multiplication.
Since the on-board memory has not been supported up to this point, we have used on-
chip memory to store the lookup tables. All the values of the lookup tables are read from
files stored on the host. These constants can be read to an input horn from files by adding
the following attributes.
Fig 11: Input from a file to an input horn
A file should exist at the location given beside the attribute Constant in the following
format.
xxxi
Fig 12: Format of file input
The value corresponding to the index attribute is used as an index to fetch the required
value from the file. The values in the files are synthesized as CONSTANTS into the
executable. In the example above, the value 99 is stored as a constant at the input horn. In
order to implement a lookup table, we can read all the values to the input horns and use a
multiplexer to get the required values. This approach has many problems, however.
There is no parameterized generate function to create all these in one step, and
opening the attributes list and adding a different value for the index and hard coding the
path name is a tedious job.
The index problem can be countered by using sixteen Mux(17,1) objects for storing
the values instead of a single Mux(257,1) object. This will allow us to use the horns
with the indexes given 0 to 15 for each Mux object instead of using all the values from 0
to 255 for each input horn.
xxxii
Fig 13: Lookup table
The 8-bit input is exposed and split into its most significant and least significant 4-bit
quantities. The LSBs are routed to all the sixteen Mux(17,1) objects. Only one of the
Muxes has the required output; this mux is selected by the other Mux object given to the
MSBs as the selective index.
The other problem faced is providing the path name to all the input horns. Initially we did
it for all the input horns as below.
xxxiii
Fig 14: Mux with pathnames given manually
Later we were made aware of an easier method for providing the file names to the
required input horns in the object. For this we create a Mux with input horns that has
Constant attributes initialized to *ROM_FILE.DTA. This will look like
Fig 15: Mux with pathnames pointing to a pointer
This Mux is then made into an object. To make this object point to a file, the object is
right-clicked and the attributes are changed as follows:
xxxiv
Fig 16: Setting the file pointer to a specific path
Initially the input values from the files were stored into registers. Although each lookup
table worked individually, there were problems with more than three lookup tables. When
we tried to synthesize more than three lookup tables, we got a C++ Exception error
followed by the corruption of the project. The frequency of this error diminished as new
versions of Viva were released, and later the mistake was corrected, resulting both in
decreased silicon usage and compilation time. Due to increasing problems with the
lookup tables we thought to import an EDIF module for some basic operations in the
algorithm. However, the EDIF generated using VHDL was not compatible with Viva. We
were later provided with a php script to do the conversion, but this did not seem to be
sufficient for our needs.
Iterative approach:
The initial implementation of an iterative approach of the algorithm was targeted at
minimal usage of silicon on the chip. There are two reasons for this. First, there was no
multi-chip communication available in Viva at that time. Our VHDL implementation
showed that a full parallel version would take two chips if Viva synthesis tool was as
xxxv
efficient as the standard Xilinx tools. Second, there were some problems encountered in
using many lookup tables. The Encryption was implemented by doing the Key
Schedule on the fly. For Decryption, the Key Schedule was done at first before
the rounds and the Expanded Key stored to be used later.
Encryption:
An iterative approach was used in the ByteSub and MixColumn transformations
inside the round and on the round also. The lookup tables required for the Encryption are
the substitution box represented by the object sbox, the Logarithm table represented as
ltable, the Antilogarithm table represented as atable, and the Rotation Box
represented as rbox. All these tables except rbox have 255 values and all are
constructed as explained above. The values are read from the files in the directories
sbox, ltable and atable placed under the directory C:\Pradeep\ on Odo
respectively. The files must be placed at that location only, since the path must be hard
coded in the design in the early versions of Viva.
Since implementation is done in an object oriented paradigm, the explanation below is
given in terms of objects created. The Encryption is a loop on the object round, which
represents a single round of the algorithm. The initial values of the key and block are
passed through the roundkey object. The output of the roundkey object and the
initial key are passed to the round object and is then looped ten times using the For
object of the Viva library. The feedback is done using the reginit objects. The
xxxvi
appropriate input to the round object for the first round and the subsequent rounds are
selected by using the N value of the For object.
round:
In essence, the round routes the data from one stage to other stage. The Key Schedule
is done on the fly as a part of the round. The inputs for the round are the block and key of
the previous round. The substituted values required for the Key Schedule are
calculated in the round_sbox object only. The N value of the outer For loop is used to
eliminate the MixColumn stage in the tenth round. It is also incremented and used as a
pointer for the rotation box of the Key Schedule. ShiftRow is implemented by
simply routing the outputs of the round_sbox object to appropriate inputs of
round_mixcolumn object.
Fig 17: Design of a round in Encryption
xxxvii
round_sbox:
The round_sbox is a loop over sbox4 that calculates the substituted values for a
column of the State. The outputs of all iterations are registered. The appropriate set of
registers is selected using the decode object. The N value of the For object is passed
into the decode which compares it with values form 0 to 4 and sets the corresponding
output bit high. The `done’ of the sbox4 object is used to give a pulse to the next input
of the For object. The inputs are muxed and passed into the sbox4 based on the N
value of the For object. The additional four values calculated are for Key Schedule.
sbox4:
This object is a loop around the sbox object and gives the substituted value for its input.
The outputs of all iterations are registered similarly as explained above.
round_mixcolumn:
This object is a loop around the mixcolumn object; it multiplies the column of the State
with a constant polynomial. The inputs are muxed and passed into the mixcolumn
object and the outputs are registered using the For object.
xxxviii
mixcolumn:
The mixcolumn object shifts the column by one for every iteration and passes them
into the mix objects, which multiply the given input with a polynomial. The output of
iteration corresponds to a cell of the output column. The output is registered using the
RegEn object based on the iteration.
mix:
This object is a loop around the multiply object. The polynomial with which the
column is to be multiplied is stored in terms of constants. In order to eliminate one table
lookup, the logarithmic values of the coefficients of the polynomial are stored instead of
the coefficients themselves. The outputs are registered and XORed after all the iterations
are completed to get the desired value.
multiply:
Two values, the coefficient of the polynomial and the other value of the State are the
inputs for multiply. The State value is passed through the ltable and the output is
added with the other input. The ADC object is used for this purpose. We need addition
modulo 255, which requires that we adjust the ADC output with the overflow bit to
obtain the desired results in all instances. The resulting value is passed through atable
xxxix
to get the product. The inputs are checked for zero. If any input is zero, then the output of
the atable is neglected and zero is passed as output.
roundkey:
The roundkey object is a collection of XOR gates that XOR the key for this particular
round with the State. All the XORs are done in parallel.
keyschedule:
Key Schedule is done on the fly in the case of Encryption. The index for the rotation box
is calculated based on the iteration. The substituted values required are calculated in the
round_sbox object itself and the values are passed to the key schedule.
The decode objects is used in almost all of the above objects. It functions as a DeMux.
A Value is passed through the Equal objects from the Viva libraries, which are initialized
to all the possible values of the input. The appropriate output based on the input is set
high.
Decryption:
The basic difference between Encryption and Decryption is the Key Schedule. The
Key Schedule is done before the rounds in this instance. The keys for all the rounds
xl
are stored in a stack-like structure, from which the key for the round is retrieved in every
iteration. All the other stages of the Decryption are similar to Encryption and require little
explanation. The round_isbox has only four iterations, since the values required for
the Key Schedule need not be calculated. The imix of the round_imixcolumn
takes a different polynomial from the one used in Encryption.
The keysh object is a loop around the keyschedule object explained above. The
outputs are packed and registered for every iteration. Later they are routed in reverse
order (since we require the keys in reverse order in Decryption) into a Mux. The selection
in the Mux is given the N value of the For loop. The rounds are started after the Key
Schedule is done.
The files corresponding to isbox of round_isbox object are stored at C:\
Pradeep\isbox on Odo.
Fig18: Design of round in Decryption
xli
Expanding the loop on the round:
Since our main aim is to use maximum resources in terms of silicon, we started by
expanding the loop on the round in order to check the efficiency of Viva in synthesizing a
larger design. For this, some changes were made to the round object explained above.
The object described above uses a Mux to eliminate the MixColumn transformation in
the final round in the Encryption and InvMixColumn transformation in the first round
for Decryption. Since we are using different objects for every round, a round object was
created with a mixcolumn object and without any Mux for all the rounds except the
last one in Encryption and the first one in Decryption. Another object lround was
created for Encryption; this is a round without a mixcolumn, and similarly in the case
for Decryption. The same approach used as above in case of Key Schedule. Key is
calculated on the fly in case of Encryption; for Decryption we used the keysh object
explained above. Both designs worked, and the results are given in the next chapter.
Non-iterative Approach:
A non-iterative approach is started by expanding the loops in ByteSub stage and also in
the MixColumn stage. Given the fact that a single lookup table took 160 slices, which is
a little less thrice the number needed in the VHDL implementation, the whole algorithm
using lookup tables cannot be done in four chips if we were to expand ByteSub and
MixColumn completely. We have thus settled for iteration on these stages. The
ShiftRow is done prior to the ByteSub to accommodate this. Then for the first
xlii
iteration the first two columns will be to plsbox8 object that has eight sbox objects.
Then the output of plsbox8 is passed to two plmix4 objects. The plmix4 object
multiplies a column with a polynomial and outputs the transformed column. The
plmix4 object has 4 plmix objects. The input for plmix4 is routed to each of these
objects by shifting them one at a time. The plmix object has four plmult objects that
multiply the coefficients. The outputs of the plmult objects are XORed to produce the
desired result. The outputs of the two iterations done on these stages are registered using
RegEn objects. The N value of the For loop which is used for iterations is used to
enable the appropriate set of registers. The Key Schedule is done in parallel to this
operation. The object corresponding to this is plkeyschedule. It used four sbox
objects for obtaining the substituted values. Once the iterations are finished, the
registered values and the output of plkeyschedule are passed to the roundkey
object to complete the round transformation. There are a total of seventy six lookup
tables in total in this round.
Fig 19: Design of round in Encryption
xliii
Implementing the multiplication in arithmetic:
Since Viva was not able to synthesize the design with many lookup tables, the
implementation was changed by replacing the lookup tables with the actual arithmetic.
Actually, as per the algorithm, we are not required to implement the whole multiplication
in the arithmetic. Since the polynomial used in Encryption and Decryption is a constant,
two objects were designed that multiply a column of the State with the polynomial used
in Encryption and Decryption.
The multiplication is done in the Galois Field GF(28). In polynomial representation, the
multiplication corresponds to a product of polynomials modulo an irreducible binary
polynomial of degree 8. The polynomial used in the algorithm is x8+x4+x3+x+1, which
can be represented in hexadecimal notation as ‘11B’.
Multiplication by the polynomial x, which can be represented in hexadecimal notation as
‘02’, is a left shift followed by a conditional XOR. If the left shift results in a carry, then
the result of the shift is XORed with ‘1B’. The polynomial used in Encryption has
coefficients ‘03’, ‘01’, ‘01’ and ‘02’. Multiplication with ‘02’ is done as explained above.
Multiplication with ‘01’ is the number itself. Multiplication with ‘03’ is split into
multiplication with ‘02’ plus multiplication ‘01’. The addition is again an XOR. The
polynomial used in Decryption has the coefficients ‘09’, ‘0B’, ‘0D’, and ‘0E’. All these
are also split in terms of powers of two and XORed at the end. For example,
xliv
multiplication with ‘09’ is split into multiplication by ‘08’ XORed with multiplication by
‘01’. Multiplication by ‘08’ is achieved by three successive multiplications by ‘02’. Since
all the coefficients are multiplied in parallel and XORed, the maximum number of shifts
done in succession is equal to three in Decryption and one in Encryption. This eliminates
a number of lookup tables, thus reducing the chip resources used.
The left shift in Viva is implemented using the RCL objects available in the corelib
library. The carryover is fed as an input to the Mux to do the conditional XOR. The
irreducible polynomial with which the result of the shift is XORed is given as a constant.
The object is named mulbyx.
Fig 20: Multiplication by x or ‘02’
The object cmmix is used to multiply a column with the polynomial to produce one
coefficient of the result. The multiplication with the polynomial in Encryption is
implemented as follows.
xlv
Fig 21: cmmix object
The complete multiplication of the polynomial with the column of the State is
implemented by shifting the column and passing it as an input to the cmmix object. The
object corresponding to that is the cmmix4 object. The MixColumn transformation is
accomplished by using four cmmix4 objects in parallel.
Since the use of arithmetic to do the MixColumn transformation reduces the silicon
usage, a full-fledged parallel implementation can be done in the round. Previously, in
case of lookup tables, both MixColumn and ByteSub stages were iterated once in
order to make two-and-one-half rounds fit on a single chip. But in that case much of the
chip is used in the MixColumn stage due to the excessive usage of lookup tables. When
these tables are eliminated, a round needs only a little more than a tenth of a chip in case
of Encryption when implemented with no iteration.
xlvi
Fig 22: Design of round in Encryption
Fig 23: Design of round Decryption
Due to enormous synthesis times, however, the whole algorithm could not be synthesized
onto one chip. Therefore, we attempted to use two chips by placing five rounds on each
chip. Although the synthesis completed, the design did not produce correct output.
Debugging was difficult as the synthesis time was about two days.
xlvii
Viva 2.3:
The initial problem with Viva 2.3 was that it did not handle files for constants. The initial
work-around proposed by Star Bridge would have required relabelling all the input horns.
Since there were 256 such horns in our initial design, this was viewed as an unacceptable
“solution.” We therefore decided to import an EDIF file generated by a VHDL
implementation. A single lookup table done in this manner took 72 slices, compared to
the 160 slices taken previously by a Viva object. Given that we had 200 lookup tables in
the entire implementation, the silicon usage was reduced by 17,600 slices, and as a result
the whole algorithm synthesized into less than half of one chip.
The implementation of the lookup table in VHDL is done using an array. The EDIF file is
generated using the fc2 compiler. This EDIF file is ported into Viva using a script written
by Heather A. Wake [25]. There are some problems with the EDIFs generated using
Synopsys, but these problems did not appear in this particular use of the Synopsys tool.
xlviii
Chapter 5: Results and Conclusions
Results on VHDL:
We used ModelSim [15] to simulate the algorithm and the Xilinx ISE tools [26] to
synthesize the code. The results for independent blocks are tabulated below. The
synthesis has been done for a Virtex2 device xc2v6000, package ff1152, speed -
4. The par statistics are generated by the Xilinx tools.
Entity Slices Percentage IOB’s Max Pin DelayLookup Table 68 1 16 6.932 nsround_sbox 1088 3 254 16.324 nsshiftrow 0 0 256 8.871 nsmixcolumn 5056 14 272 25.510 nsroundkey 128 1 384 9.989 ns
keyschedule 356 1 272 9.486 nslround 1216 3 384 14.089 nsround 6150 18 400 22.888 ns
Table 3: Par Report
Based on these results, each round would require 6150 slices. If we did a complete
parallel implementation inside each round and used different instances for each round so
that we could stream the data to pipeline different blocks of data, the total number of
slices should be 9 * 6150 + 1216 = 56566 slices, plus some overhead for data movement.
Given that one ff1152 chip contains 33792 slices, the whole algorithm when done in
parallel using lookup tables should be implementable using two chips provided that the
xlix
synthesis tools in Viva are as efficient as the standard tools. If Viva were only half as
efficient at synthesis as the standard tools, then the two chips with standard synthesis
might expand to four chips using Viva and still be feasible on the four-chip HC 36m. The
results proved otherwise, however, as will be explained later in the chapter.
Results on Viva:
Viva uses muxing to transfer data from the host to the chip and from the FPGA chip back
to the host. Since the input and output are large (around 256 bits for many of the designs),
there will be some overhead in terms of slices for moving the data from the host to the
chip. All the slice numbers listed below include overhead for input and output data
transfer.
Results for Iterative Architectures:
The following are the results for stages and rounds of iterative versions done initially. All
the results were done using a 25ns clock.
Block Slices Clock CyclesByteSub 794 50MixColumn 983 392
Round Key Addition 329 1Key Schedule 2895 130
Encryption round (iterative) 2082 444Decryption round (iterative) 1715 433Encryption round (expanded) 2050 444Decryption round(expanded) 1592 433
Table 4: Results of round and stages in Iterative approach
l
In the Encryption round (iterative), “iterative” corresponds to the loop on the round. To
be iterative means the loop is iterated in the round; “expanded” means the loop is
unrolled.
Results of Expanded Architectures:
The following are the results for of the stages and rounds used for expanded
architectures. Results for stages that use lookup tables are given for both Viva lookup
tables and VHDL imports. The reason for doing VHDL imports is that Viva 2.3 has not
yet supported File constants as did previous versions. However, importing a VHDL
module for look-up tables proved to be very advantageous in terms of silicon as well as in
making designs work. All the results use a 15ns clock. Since no design has any iteration,
they can all be completed in one clock cycle.
Block Comments SlicesByteSub Viva lookup tables 2559ByteSub VHDL lookup tables 1174MixColumn Using arithmetic 342
InvMixColumn Using arithmetic 651Round Key Addition Parallel XOR gates 329
Key Schedule For one set of keys 481Encryption round * 1 iteration, multiplication using
arithmetic, Viva lookup tables12564
Encryption round Multiplication using arithmetic, Viva lookup tables
3592
Decryption round 433 Multiplication using arithmetic, Viva lookup tables
3497
Encryption round Multiplication using arithmetic, VHDL lookup tables 1883
Decryption round 433 Multiplication using arithmetic, VHDL lookup tables 1866
Table 5: Results of round and stages in Non-Iterative approach
li
The Encryption round (denoted with an asterisk in the table) done with iteration takes
four clock cycles. A single object of the aforementioned round synthesized with no
problem in Viva, requiring 37% of the chip. Since we cannot fit three complete rounds on
a single chip, we thought to split two rounds into halves to accommodate all ten rounds
on the available four chips. Considering that each round was taking 37%, which also
includes the slices for input and output, the whole 2-1/2 rounds should have taken about
90% of the chip.
However, when we tried to synthesize two rounds on a single chip, Viva was unable to do
the synthesis, responding with an out of memory error. The diagnosis from Star
Bridge Systems was that Viva was running out of memory in search of XOR gates, which
were used extensively in the design. For this reason, this architecture cannot be expanded
for the complete algorithm.
Architectures:
Table 6: Architectures implemented on HC 36m
lii
ArithmeticNon IterativeNon IterativeDec 1 chipA7ArithmeticNon IterativeNon IterativeEnc 2 chipsA8
ArithmeticNon IterativeNon IterativeEnc 1 chipA6LookupNon IterativeOne IterationEncryptionA5 LookupNon IterativeIterativeDecryptionA4LookupNon IterativeIterativeEncryptionA3LookupIterativeIterativeDecryptionA2LookupIterativeIterativeEncryptionA1MultiplicationOn the RoundInside StagesModuleArchitecture
ArithmeticNon IterativeNon IterativeDec 1 chipA7ArithmeticNon IterativeNon IterativeEnc 2 chipsA8
ArithmeticNon IterativeNon IterativeEnc 1 chipA6LookupNon IterativeOne IterationEncryptionA5 LookupNon IterativeIterativeDecryptionA4LookupNon IterativeIterativeEncryptionA3LookupIterativeIterativeDecryptionA2LookupIterativeIterativeEncryptionA1MultiplicationOn the RoundInside StagesModuleArchitecture
The architectures A1 to A4 were done with iterations inside the stages. The A5 iteration
was actually aimed at implementing the architecture used in VHDL to compare the
resource usage and timing. But since the synthesis tool in Viva is not as efficient as
standard synthesis tools, the algorithm cannot be implemented without iterations. Worse
yet, we could not complete the full algorithm in the architecture, since Viva failed to
synthesize more than one round on a single chip (even though one round takes much less
than half a chip).
Architecture Slices Clock cycles
Comments
A1 2285 4069 Works on Viva 2.2 but not on Viva 2.3A2 4656 4480 Works both on Viva 2.2 and 2.3A3 16393 4056 Works on Viva 2.2 but not on Viva 2.3A4 14395 4077 Works both on Viva 2.2 and 2.3A5 --- --- Only one iteration works on Viva 2.2.A6 15470
1Does not synthesize in Viva2.2 but by replacing the Viva lookup tables with VHDL lookup tables synthesized in
Viva 2.3A7 18653 1 Does not synthesize in Viva2.2 but by
replacing the Viva lookup tables with VHDL lookup tables synthesized in
Viva 2.3A8 --- --- Synthesizes on Viva 2.2 but does not
give correct results. Not required on Viva 2.3
Table 7: Results of various architectures in Viva2.2 and Viva2.3
The architecture A8 was implemented when A6 and A7 failed to synthesize in Viva 2.2.
Considering the fact that a single round of this architecture took around 10% of a chip,
the whole algorithm might be synthesizable on a single chip if we consider the overhead
for input and output. For the Viva 2.3 implementation, the lookup tables were replaced by
liii
VHDL modules. Since the architectures A6 and A7 synthesized on Viva 2.3, the
architecture A8 was not tested on Viva 2.3.
It has been a source of great frustration that we have not been able to test Viva on a
reasonable full AES design. Based on the synthesis of parts of AES using Viva and on
the synthesis of part and all of AES using standard synthesis tools, there should be no
fundamental obstacle to a complete AES implementation on the HC 36m. However, the
use of Viva to implement AES in its entirety will have to wait for a later and corrected
version of the software.
Throughput:
The problem with calculating the throughput of all the architectures on the HC 36m is the
inability of Viva to support what Star Bridge Systems refers to as FILE I/O, the transfer
of data from and to files on the host through the HC 36m hardware. Also, the hardware is
presently limited to a very slow speed due to the use of a rather primitive core doing the
communication on the PCIX bus.
But if we consider the core itself as we have implemented it, rather than considering the
limitations of the machine on it is implemented, we would achieve a significant increase
in throughput in Non-iterative architectures over the basic iterative architectures.
liv
Architecture Throughput (Gbps) Frequency of the clock(MHz)
A1 0.0012 40
A2 0.0011 40
A6 8.5334 66
A7 8.5334 66
Table 8: Throughput for different architectures
The throughputs listed in the table for architectures A6 and A7 do not reflect their actual
speeds since the HC 36m cannot be run faster than 66 MHz. In order to get an estimate of
actual throughput, we decided to run both A6 and A7 on a single chip routing the output
of A6 to A7 without any intermediate registers. The design took 33,790 slices, two less
than the total slices available on a single chip, and the design ran at a 15ns clock.
Theoretically, then, both A6 and A7 should have no more than an 8 ns delay. Based on
this, the throughput of A6 and A7 can be estimated to 16 Gbps at a 125 MHz clock
frequency.
Comparisons:
In any demonstration of technology, it is necessary to compare new results against those
already achieved by others. Listed below are some of the other commercial and
academic implementations of AES done on Virtex chips in a non-iterative approach.
lv
Design Device Throughput Slices BRAMs Frequency
P. Chodowiec et al [2] Virtex XCV1000 -6 12.16 12600 80 95
SIG-AES-E [13] Virtex-E XCV1000E -8 16.54 11719 0 129.2
SIG-AES-E [13] Virtex-II XC2V2000 -5 17.80 10750 0 139.1
Helion, Pipelined [12] unknown >16 unknown
Kris Gaj [11] Virtex-E XCV1000E -8 16.00 9199 80 134.5
North Pole Engg. [18] unknown 12.8 5840 160 100
M. McLoone et al [14] Virtex-E XCV812 -8 6.95 2222 100 54.35
Table 9: Other Implementations on the Virtex family chips
The throughput above is listed in Gbps and the Clock frequency in MHz. The SIG
implementations above do the GF(28) arithmetic using a quadratic extension of a field
GF(24); the authors claim a significant improvement in the ability of the synthesis tools to
extract efficient logic. Although all the implementations listed above exploited the
parallelism and pipelining inherent in the algorithm, they differ in many aspects, making
it difficult to obtain a straight forward comparison with our implementation.
One difficulty is with the amount of pipelining, which directly affects the throughput
since our implementation has no pipelining. Second, the chip on which they are
implemented differs in the number of slices, maximum frequency, Block RAMs, and so
forth. Other aspects to be considered include the synthesis tool when compared against
Viva’s synthesis tool, since Viva’s synthesis cannot be expected to compete with more
established commercial synthesis tools in terms of efficiency.
lvi
Finally, when we consider that the results corresponding to throughput for our
implementation are an estimate but have not been tested practically, making comparisons
becomes more difficult and highlights testing as an important part of future work.
Future Work:
The architectures above can be further enhanced for varying bit lengths of Key. All the
architectures above can be directly used for an ECB (Electronic Code Book) mode of
encryption. Since the ECB mode is not regarded as secure compared to the other modes
such as CBC (Cipher Block Chaining), OFB (Output Feedback) and Counter mode, it
would be good if we could embed some of these modes into the architecture and give the
user flexibility in terms of security required as well as data rate. One more enhancement
would be to combine the encryption and decryption cores into a single core, thus making
it possible to shift between encryption and decryption by means of a select bit. Though
much of the silicon cannot be reused in the round, the Key Schedule is the same for
Encryption and Decryption.
If we consider a strictly FPGA implementation instead of an HC 36m implementation,
the first order of business would be to ascertain the actual throughputs of the existing
cores. It may be useful to try to produce an efficient AES core that took less silicon and
yet had a considerable throughput. Since we have already looked into different
approaches for the implementations of different transformations in Rijndael, it would be
easy to try different architectures for cheaper implementation in terms of silicon.
lvii
References
[1] Kazumaro Aoki and Helger Lipmaa, Fast Implementations of AES Candidates,
The Third Advanced Encryption Standard Candidate Conference, New York,
NY, April 13-14, 2000 .
[2] P. Chodowiec, P. Khuon, and K. Gaj, Fast Implementations of Secret-Key
Block Ciphers Using Mixed Inner- and Outer-Round Pipelining, ACM/SIGDA
Ninth International Symposium on Field Programmable Gate Arrays,
Monterey, California, February 11-13, 2001.
[3] J. Daemen and V. Rijmen. The Design of Rijndael: AES- The Advanced
Encryption Standard (Information Security and Cryptography). Springer
Verlag, Berlin, 2001.
[4] Joan Daemen and Vincent Rijmen. AES Proposal: Rijndael, Mar 09, 1999,
<http://csrc.nist.gov/CryptoToolkit/aes/rijndael/Rijndael.pdf>, as referenced on
Nov 15th , 2003.
lviii
[5] Joan Daemen and Vincent Rijmen, Rijndael for AES, AES Candidate
Conference, NY, April 13-14, 2000.
[6] J. Daemen and V. Rijmen, AES Public Comment from the Rijndael Team,
1999, <http://csrc.nist.gov/CryptoToolkit/aes/round1/comments/990414-jdae
men.pdf>, as referenced on Nov 15th, 2003.
[7] A.J. Elbirt, W. Yip, B. Chetwynd and C. Paar, An FPGA Implementation and
Performance Evaluation of the AES Block Cipher Candidate Algorithm
Finalists, The Third Advance Encryption Standard (AES3) Candidate
Conference, New York, NY, April 13-14,2000.
[8] Brain Gladman’s Homepage, Implementations of AES, as referenced on Nov
15th 2003, <http://fp.gladman.plus.com/cryptography _technology/ rijndael/>
[9] K. Gaj and P. Chodowiec. Fast implementation and fair comparison of the final
candidates for Advanced Encryption Standard using Field Programmable Gate
Arrays. Proc. RSA Security Conference, Cryptographer's Track, San
Francisco, April 9, 2001.
[10] K.Gaj and P.Chodowiec, Comparison of the hardware performance of the
AES candidates using reconfigurable hardware, Third Advanced Encryption
Standard (AES) Candidate Conference, New York, NY, April 13-14, 2000.
lix
[11] Kris Gaj’s Website, Implementation of AES cores,
<http://ece.gmu.edu/crypto/rijndael.htm> , as referenced on Nov15th , 2003.
[12] Helion Technologies Inc., Website, AES (Rijndael) Cores, as referenced on
Nov 15th, 2003, < http://www.heliontech.com/core2.htm>.
[13] Kimmo U. Jarvinen, Matti T. Tommiska and Jorma O. Skytta, A Fully
Pipelined Memoryless 17.8 Gbps AES-128 Encryptor, International
Symposium on Field Programmable Gate Arrays, Monterey, California,
February 23-25, 2003.
[14] Maire McLoone and J.V McCanny, High Performance Single-Chip FPGA
Rijndael Algorithm Implementations, Workshop on Cryptographic Hardware
and Embedded Systems, May 14-16, 2001
[15] ModelSim, Inc., website, <http://www.model.com/>, as referenced on Nov 15 th,
2003.
[16] S. Murphy and M.J.B. Robshaw, Essential Algebraic Structure within the AES,
Information Security Group, University of London, Surrey, U.K, 2002.
[17] National Institute for Standards and Technology, AES Home Page,
<http://csrc.nist.gov/CryptoToolkit/aes>, as referenced on Nov 15th, 2003.
lx
[18] North Pole Engineering, Inc., Website, AES Core User’s Manual,
<http://www.northpoleengineering.com/Documents/AES%20Manual.pdf> , as
referenced on Nov 15th , 2003.
[19] Rijmen’s personal page, <http://www.esat.kuleuven.ac.be/~rijmen/ rijndael>,
as referenced on Nov 15th, 2003.
[20] Star Bridge Systems, Inc., web site, < http://www.starbridgesystems .com>, as
referenced on Nov 15th , 2003.
[21] Star Bridge Systems, Viva Tutorials, as referenced on Nov 15 th 2003,
<http://www.starbridgesystems.com/support/tutorials.html>
[22]Star Bridge Systems, HC36m, as referenced on Nov 15 th 2003,
<http://www.starbridgesystems.com/products/hc36.html>
[23] The Rijndael official page, <http://www.rijndael.com/>, as referenced on
Nov15th, 2003.
[24] Bryan Weeks, Mark Bean, Tom Rozylowicz and Chris Ficke, Hardware
Performance Simulations of Round 2 Advanced Encryption Standard
Algorithms, AES Candidate Conference, New York, NY, April 13-14, 2000.
lxi
[25] Heather A. Wake, Translating EDIF using Perl, CSE Department Technical
Report, University of South Carolina, 2003.
[26] Xilinx Inc., Xilinx ISE, as referenced on Nov 15th, 2003,
<http://www.xilinx.com/xlnx/xil_prodcat_landingpage.jsp?
title=ISE+Foundation >
lxii
Appendix A: C code used for testing[3]
#include<stdio.h>typedef unsigned char word8;typedef unsigned int word32;
word8 Logtable[256] ={0, 0, 25, 1, 50, 2, 26, 198, 75, 199, 27, 104, 51, 238, 223, 3, 100, 4, 224, 14, 52, 141, 129, 239, 76, 113, 8, 200, 248, 105, 28, 193, 125, 194, 29, 181, 249, 185, 39, 106, 77, 228, 166, 114, 154, 201, 9, 120, 101, 47, 138, 5, 33, 15, 225, 36, 18, 240, 130, 69, 53, 147, 218, 142, 150, 143, 219, 189, 54, 208, 206, 148, 19, 92, 210, 241, 64, 70, 131, 56,102, 221, 253, 48, 191, 6, 139, 98, 179, 37, 226, 152, 34, 136, 145, 16,126, 110, 72, 195, 163, 182, 30, 66, 58, 107, 40, 84, 250, 133, 61, 186, 43, 121, 10, 21, 155, 159, 94, 202, 78, 212, 172, 229, 243, 115, 167, 87,175, 88, 168, 80, 244, 234, 214, 116, 79, 174, 233, 213, 231, 230, 173, 232,44, 215, 117, 122, 235, 22, 11, 245, 89, 203, 95, 176, 156, 169, 81, 160,127, 12, 246, 111, 23, 196, 73, 236, 216, 67, 31, 45, 164, 118, 123,183,204, 187, 62, 90, 251, 96, 177, 134, 59, 82, 161, 108, 170, 85, 41, 157,151, 178, 135, 144, 97, 190, 220, 252, 188, 149, 207, 205, 55, 63, 91, 209, 83, 57, 132, 60, 65, 162, 109, 71, 20, 42, 158, 93, 86, 242, 211, 171, 68, 17, 146, 217, 35, 32, 46, 137, 180, 124, 184, 38, 119, 153, 227, 165, 103, 74, 237, 222, 197, 49, 254, 24, 13, 99, 140, 128, 192, 247, 112, 7};
word8 Alogtable[256] = { 1, 3, 5, 15, 17, 51, 85, 255, 26, 46, 114, 150, 161, 248, 19, 53, 95, 225, 56, 72, 216, 115, 149, 164, 247, 2, 6, 10, 30, 34, 102, 170, 229, 52, 92, 228, 55, 89, 235, 38, 106, 190, 217, 112, 144, 171, 230, 49, 83, 245, 4, 12, 20, 60, 68, 204, 79, 209, 104, 184, 211, 110, 178, 205, 76, 212, 103, 169, 224, 59, 77, 215, 98, 166, 241, 8, 24, 40, 120, 136, 131, 158, 185, 208, 107, 189, 220, 127, 129, 152, 179, 206, 73, 219, 118, 154, 181, 196, 87, 249, 16, 48, 80, 240, 11, 29, 39, 105, 187, 214, 97, 163, 254, 25, 43, 125, 135, 146, 173, 236, 47, 113, 147, 174, 233, 32, 96, 160, 251, 22, 58, 78, 210, 109, 183, 194, 93, 231, 50, 86, 250, 21, 63, 65, 195, 94, 226, 61, 71, 201, 64, 192, 91, 237, 44, 116, 156, 191, 218, 117, 159, 186, 213, 100, 172, 239, 42, 126, 130, 157, 188, 223, 122, 142, 137, 128, 155, 182, 193, 88, 232, 35, 101, 175, 234, 37, 111, 177, 200, 67, 197, 84, 252, 31, 33, 99, 165, 244, 7, 9, 27, 45, 119, 153, 176, 203, 70, 202, 69, 207, 74, 222, 121, 139, 134, 145, 168, 227, 62, 66, 198, 81, 243, 14, 18, 54, 90, 238, 41, 123, 141, 140, 143, 138, 133, 148, 167, 242, 13, 23, 57, 75, 221, 124, 132, 151, 162, 253, 28, 36, 108, 180, 199, 82, 246, 1, };
lxiii
word8 S[256] = { 99, 124, 119, 123, 242, 107, 111, 197, 48, 1, 103, 43, 254, 215, 171, 118, 202, 130, 201, 125, 250, 89, 71, 240, 173, 212, 162, 175, 156, 164, 114, 192, 183, 253, 147, 38, 54, 63, 247, 204, 52, 165, 229, 241, 113, 216, 49, 21, 4, 199, 35, 195, 24, 150, 5, 154, 7, 18, 128, 226, 235, 39, 178, 117, 9, 131, 44, 26, 27, 110, 90, 160, 82, 59, 214, 179, 41, 227, 47, 132, 83, 209, 0, 237, 32, 252, 177, 91, 106, 203, 190, 57, 74, 76, 88, 207, 208, 239, 170, 251, 67, 77, 51, 133, 69, 249, 2, 127, 80, 60, 159, 168, 81, 163, 64, 143, 146, 157, 56, 245, 188, 182, 218, 33, 16, 255, 243, 210, 205, 12, 19, 236, 95, 151, 68, 23, 196, 167, 126, 61, 100, 93, 25, 115, 96, 129, 79, 220, 34, 42, 144, 136, 70, 238, 184, 20, 222, 94, 11, 219, 224, 50, 58, 10, 73, 6, 36, 92, 194, 211, 172, 98, 145, 149, 228, 121, 231, 200, 55, 109, 141, 213, 78, 169, 108, 86, 244, 234, 101, 122, 174, 8, 186, 120, 37, 46, 28, 166, 180, 198, 232, 221, 116, 31, 75, 189, 139, 138, 112, 62, 181, 102, 72, 3, 246, 14, 97, 53, 87, 185, 134, 193, 29, 158, 225, 248, 152, 17, 105, 217, 142, 148, 155, 30, 135, 233, 206, 85, 40, 223, 140, 161, 137, 13, 191, 230, 66, 104, 65, 153, 45, 15, 176, 84, 187, 22, };
word8 Si[256] = { 82, 9, 106, 213, 48, 54, 165, 56, 191, 64, 163, 158, 129, 243, 215, 251, 124, 227, 57, 130, 155, 47, 255, 135, 52, 142, 67, 68, 196, 222, 233, 203, 84, 123, 148, 50, 166, 194, 35, 61, 238, 76, 149, 11, 66, 250, 195, 78, 8, 46, 161, 102, 40, 217, 36, 178, 118, 91, 162, 73, 109, 139, 209, 37, 114, 248, 246, 100, 134, 104, 152, 22, 212, 164, 92, 204, 93, 101, 182, 146, 108, 112, 72, 80, 253, 237, 185, 218, 94, 21, 70, 87, 167, 141, 157, 132, 144, 216, 171, 0, 140, 188, 211, 10, 247, 228, 88, 5, 184, 179, 69, 6, 208, 44, 30, 143, 202, 63, 15, 2, 193, 175, 189, 3, 1, 19, 138, 107, 58, 145, 17, 65, 79, 103, 220, 234, 151, 242, 207, 206, 240, 180, 230, 115, 150, 172, 116, 34, 231, 173, 53, 133, 226, 249, 55, 232, 28, 117, 223, 110, 71, 241, 26, 113, 29, 41, 197, 137, 111, 183, 98, 14, 170, 24, 190, 27, 252, 86, 62, 75, 198, 210, 121, 32, 154, 219, 192, 254, 120, 205, 90, 244, 31, 221, 168, 51, 136, 7, 199, 49, 177, 18, 16, 89, 39, 128, 236, 95, 96, 81, 127, 169, 25, 181, 74, 13, 45, 229, 122, 159, 147, 201, 156, 239, 160, 224, 59, 77, 174, 42, 245, 176, 200, 235, 187, 60, 131, 83, 153, 97, 23, 43, 4, 126, 186, 119, 214, 38, 225, 105, 20, 99, 85, 33, 12, 125, };
word32 RC[30] = { 0x00, 0x01,0x02, 0x04, 0x08, 0x10, 0x20, 0x40, 0x80, 0x1B, 0x36, 0x6C, 0xD8, 0xAB,
0x4D, 0x9A, 0x2F, 0x5E, 0xBC, 0x63, 0xC6, 0x97, 0x35, 0x6A, 0xD4, 0xB3, 0x7D, 0xFA, 0xEF, 0xC5 };
#define MAXBC 8#define MAXKC 8#define MAXROUNDS 14
static word8 shifts[5][4] ={ 0,1,2,3, 0,1,2,3, 0,1,2,3,
0,1,2,4, 0,1,3,4 }; static int numrounds[5][5]={
10,11,12,13,14, 11,11,12,13,14, 12,12,12,13,14, 13,13,13,13,14, 14,14,14,14,14};
lxiv
int BC,KC,ROUNDS;
word8 mul(word8 a, word8 b) {
if (a && b) return Alogtable[(Logtable[a] + Logtable[b])%255];else return 0;
}
void AddRoundKey(word8 a[4][MAXBC], word8 rk[4][MAXBC]) {
int I, j;for(I = 0; I < 4; i++)
for(j = 0; j < BC; j++) a[i][j] ^= rk[i][j];}
void SubBytes(word8 a[4][MAXBC], word8 box[256]) {
int I, j;for(I = 0; I < 4; i++)
for(j = 0; j < BC; j++) a[i][j] = box[a[i][j]] ;}
void ShiftRows(word8 a[4][MAXBC], word8 d) { word8 tmp[MAXBC]; int I,j; if (d==0){
for(i=1;i<4;i++){ for(j=0;j<BC;j++)
tmp[j]=a[i][(j+shifts[BC-4][i])%BC]; for(j=0;j<BC;j++) a[i][j]=tmp[j];
} } else{
for(i=1;i<4;i++){ for(j=0;j<BC;j++)
tmp[j]=a[i][(BC+j-shifts[BC-4][i]) %BC]; for(j=0;j<BC;j++) a[i][j]=tmp[j];
} } }
void Mixcolumns(word8 a[4][MAXBC]){
word8 b[4][MAXBC]; int I ,j; for(j = 0; j < BC; j++)
for(I = 0; I < 4; i++)b[i][j] = mul(2,a[i][j])
^ mul(3,a[(I + 1) % 4][j])^ a[(I + 2) % 4][j]^ a[(I + 3) % 4][j];
for(I = 0; I < 4; i++)for(j = 0; j < BC; j++) a[i][j] = b[i][j];
}
void InvMixColumn(word8 a[4][MAXBC]) {
word8 b[4][MAXBC];int I, j;for(j = 0; j < BC; j++)
lxv
for(I = 0; I < 4; i++) b[i][j] = mul(0xe,a[i][j])
^ mul(0xb,a[(I + 1) % 4][j]) ^ mul(0xd,a[(I + 2) % 4][j])^ mul(0x9,a[(I + 3) % 4][j]);
for(I = 0; I < 4; i++)for(j = 0; j < BC; j++) a[i][j] = b[i][j];
}
int KeyExpansion(word8 k[4][MAXKC], word8 W[MAXROUNDS+1][4][MAXBC]){
int I, j,t, Rcpointer=1; word8 tk[4][MAXKC];
for(j = 0; j < KC; j++)for(I = 0; I < 4; i++)
tk[i][j] = k[i][j];t = 0;for(j = 0; (j < KC) && (t < (ROUNDS+1)*BC); j++, t++)
for(I = 0; I < 4; i++) W[t / BC][i][t % BC] = tk[i][j];while (t < (ROUNDS+1)*BC) {
for(I = 0; I < 4; i++)tk[i][0] ^= S[tk[(i+1)%4][KC-1]];
tk[0][0] ^= RC[Rcpointer++];if (KC <= 6)
for(j = 1; j < KC; j++)for(I = 0; I < 4; i++) tk[i][j] ^= tk[i][j-1];
else {for(j = 1; j <4; j++)
for(I = 0; I < 4; i++) tk[i][j] ^= tk[i][j-1];for(I = 0; I < 4; i++) tk[i][KC/2] ^= S[tk[i][KC/2 – 1]];for(j = 5; j < KC; j++)
for(I = 0; I < 4; i++) tk[i][j] ^= tk[i][j-1];}for(j = 0; (j < KC) && (t < (ROUNDS+1)*BC); j++, t++)
for(I = 0; I < 4; i++) W[t / BC][i][t % BC] = tk[i][j];}return 0;
}
int Encrypt(word8 a[4][MAXBC], word8 rk[MAXROUNDS+1][4][MAXBC]) {
int r; AddRoundKey(a,rk[0]); for(r=1;r<ROUNDS; r++){
SubBytes(a,S); ShiftRows(a,0); Mixcolumns(a); AddRoundKey(a,rk[r]);
}
SubBytes(a,S); ShiftRows(a,0); AddRoundKey(a,rk[ROUNDS]); return 0;
}
int Decrypt (word8 a[4][MAXBC], word8 rk[MAXROUNDS+1][4][MAXBC]) {
int r; AddRoundKey(a,rk[ROUNDS]); SubBytes(a,Si);
lxvi
ShiftRows(a,1); for(r=ROUNDS-1; r>0;r--){
AddRoundKey(a,rk[r]); InvMixColumn(a); SubBytes(a,Si); ShiftRows(a,1);
}
AddRoundKey(a,rk[0]); return 0;
}
int main(){
int I, j; word8 a[4][MAXBC], rk[MAXROUNDS+1][4][MAXBC], sk[4][MAXKC]; for(KC=4; KC<=8; KC++)
for(BC=4; BC<=8; BC++){ ROUNDS=numrounds[KC-4][BC-4]; for(j=0;j<BC;j++)
for(i=0;i<4;i++) a[i][j]=0; for(j=0;j<KC;j++)
for(i=0;i<4;i++) sk[i][j] =0;KeyExpansion(sk,rk);Encrypt(a,rk);printf(“blocklenght %d keylenght %d\n” ,32*BC,32*KC);for(j=0;j<BC;j++)
for(i=0;i<4;i++)printf(“%02d “,a[i][j]);
printf(“\n”); Decrypt(a,rk); for(j=0;j<4;j++)
for(i=0;i<4;i++) printf(“%02d”, a[i][j]);
printf(“\n\n”); } return 0;
}
lxvii
Appendix B: VHDL Implementation
packages.vhd
library ieee;use ieee.std_logic_1164.all;use ieee.std_logic_unsigned.all;package int_types is subtype large_int is integer range 0 to 255 ;end package;
library ieee;use ieee.std_logic_arith.all;use ieee.std_logic_1164.all;use work.int_types.all;
package key_types istype keyarray is array (0 to 15) of large_int;end package;
sbox_ram.vhd
library ieee;use ieee.std_logic_1164.all;use ieee.std_logic_unsigned.all;use work.int_types.all;
entity RAM_sbox isport( ma :in large_int ; mb :out large_int);end entity;
architecture behav of RAM_sbox isbeginprocess (ma)type box is array (0 to 255 ) of large_int; constant s:box:=(99, 124, 119, 123, 242, 107, 111, 197, 48, 1, 103, 43, 254, 215, 171, 118, 202, 130, 201, 125, 250, 89, 71, 240, 173, 212, 162, 175, 156, 164, 114, 192, 183, 253, 147, 38, 54, 63, 247, 204, 52, 165, 229, 241, 113, 216, 49, 21, 4, 199, 35, 195, 24, 150, 5, 154, 7, 18, 128, 226, 235, 39, 178, 117, 9, 131, 44, 26, 27, 110, 90, 160, 82, 59, 214, 179, 41, 227, 47, 132, 83, 209, 0, 237, 32, 252, 177, 91, 106, 203, 190, 57, 74, 76, 88, 207, 208, 239, 170, 251, 67, 77, 51, 133, 69, 249, 2, 127, 80, 60, 159, 168, 81, 163, 64, 143, 146, 157, 56, 245, 188, 182, 218, 33, 16, 255, 243, 210, 205, 12, 19, 236, 95, 151, 68, 23, 196, 167, 126, 61, 100, 93, 25, 115, 96, 129, 79, 220, 34, 42, 144, 136, 70, 238, 184, 20, 222, 94, 11, 219, 224, 50, 58, 10, 73, 6, 36, 92, 194, 211, 172, 98, 145, 149, 228, 121, 231, 200, 55, 109, 141, 213, 78, 169, 108, 86, 244, 234, 101, 122, 174, 8, 186, 120, 37, 46, 28, 166, 180, 198, 232, 221, 116, 31, 75, 189, 139, 138, 112, 62, 181, 102, 72, 3, 246, 14, 97, 53, 87, 185, 134, 193, 29, 158, 225, 248, 152, 17, 105, 217, 142, 148, 155, 30, 135, 233, 206, 85, 40, 223, 140, 161, 137, 13, 191, 230, 66, 104, 65, 153, 45, 15, 176, 84, 187, 22, );
lxviii
beginmb<=s(ma);end process;end architecture;
logtable_ram
library ieee;use ieee.std_logic_1164.all;use ieee.std_logic_unsigned.all;use work.int_types.all;
entity RAM_logtable isport( ma :in large_int ; mb :out large_int);end entity;
architecture behav of RAM_logtable isbeginprocess (ma)type box is array (0 to 255 ) of large_int; constant logtable:box:=(0, 0, 25, 1, 50, 2, 26, 198, 75, 199, 27, 104, 51, 238, 223, 3, 100, 4, 224, 14, 52, 141, 129, 239, 76, 113, 8, 200, 248, 105, 28, 193, 125, 194, 29, 181, 249, 185, 39, 106, 77, 228, 166, 114, 154, 201, 9, 120, 101, 47, 138, 5, 33, 15, 225, 36, 18, 240, 130, 69, 53, 147, 218, 142, 150, 143, 219, 189, 54, 208, 206, 148, 19, 92, 210, 241, 64, 70, 131, 56,102, 221, 253, 48, 191, 6, 139, 98, 179, 37, 226, 152, 34, 136, 145, 16,126, 110, 72, 195, 163, 182, 30, 66, 58, 107, 40, 84, 250, 133, 61, 186, 43, 121, 10, 21, 155, 159, 94, 202, 78, 212, 172, 229, 243, 115, 167, 87,175, 88, 168, 80, 244, 234, 214, 116, 79, 174, 233, 213, 231, 230, 173, 232,44, 215, 117, 122, 235, 22, 11, 245, 89, 203, 95, 176, 156, 169, 81, 160,127, 12, 246, 111, 23, 196, 73, 236, 216, 67, 31, 45, 164, 118, 123,183,204, 187, 62, 90, 251, 96, 177, 134, 59, 82, 161, 108, 170, 85, 41, 157,151, 178, 135, 144, 97, 190, 220, 252, 188, 149, 207, 205, 55, 63, 91, 209, 83, 57, 132, 60, 65, 162, 109, 71, 20, 42, 158, 93, 86, 242, 211, 171, 68, 17, 146, 217, 35, 32, 46, 137, 180, 124, 184, 38, 119, 153, 227, 165, 103, 74, 237, 222, 197, 49, 254, 24, 13, 99, 140, 128, 192, 247, 112, 7);
beginmb<=logtable(ma);end process;end architecture;
alogtable.vhd
library ieee;use ieee.std_logic_1164.all;use ieee.std_logic_unsigned.all;use work.int_types.all;
entity RAM_alogtable isport( ma :in large_int ; mb :out large_int);end entity;
architecture behav of RAM_alogtable isbeginprocess (ma)type box is array (0 to 255 ) of large_int; constant alogtable:box:=
lxix
(1, 3, 5, 15, 17, 51, 85, 255, 26, 46, 114, 150, 161, 248, 19, 53, 95, 225, 56, 72, 216, 115, 149, 164, 247, 2, 6, 10, 30, 34, 102, 170, 229, 52, 92, 228, 55, 89, 235, 38, 106, 190, 217, 112, 144, 171, 230, 49, 83, 245, 4, 12, 20, 60, 68, 204, 79, 209, 104, 184, 211, 110, 178, 205, 76, 212, 103, 169, 224, 59, 77, 215, 98, 166, 241, 8, 24, 40, 120, 136, 131, 158, 185, 208, 107, 189, 220, 127, 129, 152, 179, 206, 73, 219, 118, 154, 181, 196, 87, 249, 16, 48, 80, 240, 11, 29, 39, 105, 187, 214, 97, 163, 254, 25, 43, 125, 135, 146, 173, 236, 47, 113, 147, 174, 233, 32, 96, 160, 251, 22, 58, 78, 210, 109, 183, 194, 93, 231, 50, 86, 250, 21, 63, 65, 195, 94, 226, 61, 71, 201, 64, 192, 91, 237, 44, 116, 156, 191, 218, 117, 159, 186, 213, 100, 172, 239, 42, 126, 130, 157, 188, 223, 122, 142, 137, 128, 155, 182, 193, 88, 232, 35, 101, 175, 234, 37, 111, 177, 200, 67, 197, 84, 252, 31, 33, 99, 165, 244, 7, 9, 27, 45, 119, 153, 176, 203, 70, 202, 69, 207, 74, 222, 121, 139, 134, 145, 168, 227, 62, 66, 198, 81, 243, 14, 18, 54, 90, 238, 41, 123, 141, 140, 143, 138, 133, 148, 167, 242, 13, 23, 57, 75, 221, 124, 132, 151, 162, 253, 28, 36, 108, 180, 199, 82, 246, 1, );
beginmb<=alogtable(ma);end process;end architecture;
rbox_ram.vhd
library ieee;use ieee.std_logic_1164.all;use ieee.std_logic_unsigned.all;use work.int_types.all;
entity RAM_rbox isport( ma :in large_int ; mb :out large_int);end entity;
architecture behav of RAM_rbox isbeginprocess (ma)
type TRC is array(0 to 29) of large_int;constant RC:TRC:=(16#00#, 16#01#,16#02#, 16#04#, 16#08#, 16#10#, 16#20#,16#40#, 16#80#, 16#1B#, 16#36#, 16#6C#, 16#D8#,16#AB#, 16#4D#, 16#9A#, 16#2F#, 16#5E#,16#BC#, 16#63#, 16#C6#, 16#97#, 16#35#, 16#6A#, 16#D4#, 16#B3#,16#7D#, 16#FA#, 16#EF#, 16#C5# );
beginmb<=RC(ma);end process;end architecture;
addcmp.vhd
library ieee;use ieee.std_logic_1164.all;use ieee.std_logic_unsigned.all;use ieee.numeric_std.all;use work.int_types.all;
lxx
entity addcmp isport( ma1,ma2 :in large_int; mb1:out large_int);
end entity;
architecture behav of addcmp isbeginprocess(ma1,ma2)
beginif ma1 + ma2 > 255 then mb1<=ma1 + ma2 - 255;else mb1<=ma1 + ma2;end if;end process;end architecture;
shiftrow.vhd
library ieee;use ieee.std_logic_1164.all;use ieee.std_logic_unsigned.all;use work.int_types.all;
entity shiftrow is
port( sa1,sa2,sa3,sa4,sa5,sa6,sa7,sa8,sa9,sa10,sa11,sa12,sa13,sa14,sa15,sa16: in large_int; sb1,sb2,sb3,sb4,sb5,sb6,sb7,sb8,sb9,sb10,sb11,sb12,sb13,sb14,sb15,sb16: out large_int);end entity;
architecture behav of shiftrow isbeginstorage: process(sa1,sa2,sa3,sa4,sa5,sa6,sa7,sa8,sa9,sa10,sa11,sa12,sa13,sa14,sa15,sa16)
beginsb1<=sa1;sb2<=sa6;sb3<=sa11;sb4<=sa16;sb5<=sa5;sb6<=sa10;sb7<=sa15;sb8<=sa4;sb9<=sa9;sb10<=sa14;sb11<=sa3;sb12<=sa8;sb13<=sa13;sb14<=sa2;sb15<=sa7;sb16<=sa12;end process;end architecture;
lxxi
roundkey.vhd
library ieee;use ieee.std_logic_1164.all;use ieee.std_logic_unsigned.all;use ieee.std_logic_arith.all;use work.int_types.all;
entity roundkey isport( a,b :in large_int;c: out large_int);end entity;
architecture behav of roundkey is begin storage: process(a,b)
beginc<=conv_integer(conv_std_logic_vector(a,8) xor conv_std_logic_vector(b,8) );end process;end architecture;
round_roundkey.vhd
library ieee;use ieee.std_logic_1164.all;use ieee.std_logic_unsigned.all;use work.int_types.all;use work.key_types.all;
entity round_roundkey is port( aa1,aa2,aa3,aa4,aa5,aa6,aa7,aa8,aa9,aa10,aa11,aa12,aa13,aa14,aa15,aa16,k1,k2,k3,k4,k5,k6,k7,k8,k9,k10,k11,k12,k13,k14,k15,k16 : in large_int;b1,b2,b3,b4,b5,b6,b7,b8,b9,b10,b11,b12,b13,b14,b15,b16:out large_int);end entity;
architecture struct of round_roundkey iscomponent roundkey port( a,b:in large_int; c:out large_int);end component ;
beginblock18:roundkeyport map(aa1,k1,b1);block19: roundkeyport map(aa2,k2, b2);block20:roundkeyport map(aa3,k3, b3);block21:roundkeyport map(aa4,k4, b4);block22:roundkeyport map(aa5,k5, b5);block23:roundkeyport map(aa6,k6, b6);block24:roundkeyport map(aa7,k7, b7);block25:roundkeyport map(aa8,k8, b8);block26:roundkeyport map(aa9, k9,b9);
lxxii
block27:roundkeyport map(aa10,k10, b10);block28:roundkeyport map(aa11,k11, b11);block29:roundkeyport map(aa12,k12, b12);block30:roundkeyport map(aa13,k13, b13);block31:roundkeyport map(aa14,k14, b14);block32:roundkeyport map(aa15,k15, b15);block33:roundkeyport map(aa16,k16, b16);end architecture;
round_sbox.vhd
use work.int_types.all;
entity round_sbox is port( aa1,aa2,aa3,aa4,aa5,aa6,aa7,aa8,aa9,aa10,aa11,aa12,aa13,aa14,aa15,aa16 : in large_int;b1,b2,b3,b4,b5,b6,b7,b8,b9,b10,b11,b12,b13,b14,b15,b16:out large_int);end entity;
architecture struct of round_sbox iscomponent RAM_sbox port( ma:in large_int; mb:out large_int);end component ;
beginblock18:RAM_sboxport map(aa1,b1);block19: RAM_sboxport map(aa2, b2);block20:RAM_sboxport map(aa3, b3);block21:RAM_sboxport map(aa4, b4);block22:RAM_sboxport map(aa5, b5);block23:RAM_sboxport map(aa6, b6);block24:RAM_sboxport map(aa7, b7);block25:RAM_sboxport map(aa8, b8);block26:RAM_sboxport map(aa9, b9);block27:RAM_sboxport map(aa10, b10);block28:RAM_sboxport map(aa11, b11);block29:RAM_sboxport map(aa12, b12);block30:RAM_sboxport map(aa13, b13);block31:RAM_sboxport map(aa14, b14);block32:RAM_sboxport map(aa15, b15);
lxxiii
block33:RAM_sboxport map(aa16, b16);end architecture;
multiply.vhd
library ieee;use ieee.std_logic_1164.all;use ieee.std_logic_unsigned.all;use ieee.numeric_std.all;use work.int_types.all;
entity multiply isport( ma1,ma2 :in large_int; mb1:out large_int);
end entity; architecture behav of multiply is
component RAM_logtable isport( ma :in large_int ; mb :out large_int); end component;
component RAM_alogtable isport( ma :in large_int ; mb :out large_int); end component;
component addcmp isport ( ma1,ma2: in large_int; mb1:out large_int);end component;signal ap,bp,dp,ep,hp,ip:large_int;
begindut1: RAM_logtable port map(ma1,bp);
dut3: addcmp port map(bp,ma2,dp);dut4: RAM_alogtable port map(dp,ip);
process(ma1,ma2,ip)beginif ma1 = 0 or ma2 = 0thenmb1 <= 0;else
mb1<=ip;end if;end process; end architecture;
mix.vhd
library ieee;use ieee.std_logic_1164.all;use ieee.std_logic_unsigned.all;use work.int_types.all;
lxxiv
entity mix is port( ma1,ma2,ma3,ma4,pc1,pc2,pc3,pc4 :in large_int; mb1:out large_int);end entity;architecture behav of mix is
component multiplyport( ma1,ma2 :in large_int;mb1: out large_int);end component;
component roundkey port( a,b :in large_int;c: out large_int);end component;
signal aa1,aa2,aa3,aa6 :large_int;begin
block1: multiply port map(ma1,pc1,aa1);block2: multiply port map(ma2,pc2,aa2);block3: roundkey port map(aa1,aa2,aa3);
block6: roundkey port map(ma4,ma3,aa6);block7: roundkey port map(aa6,aa3,mb1);end architecture;
mixcolumn.vhd
library ieee;use ieee.std_logic_1164.all;use ieee.std_logic_unsigned.all;use work.int_types.all;
entity mixcolumn is
port( ma1,ma2,ma3,ma4,ma5,ma6,ma7,ma8,ma9,ma10,ma11,ma12,ma13,ma14,ma15,ma16,pc1,pc2,pc3,pc4 :in large_int; mb1,mb2,mb3,mb4,mb5,mb6,mb7,mb8,mb9,mb10,mb11,mb12,mb13,mb14,mb15,mb16 :out large_int);end entity; architecture behav of mixcolumn is
component mixport( ma1,ma2,ma3,ma4,pc1,pc2,pc3,pc4 :in large_int;mb1:out large_int);end component;
beginblock1: mix port map(ma1,ma2,ma3,ma4,pc1,pc2,pc3,pc4,mb1);block2: mix port map(ma2,ma3,ma4,ma1,pc1,pc2,pc3,pc4,mb2);block3: mix port map(ma3,ma4,ma1,ma2,pc1,pc2,pc3,pc4,mb3);block4: mix port map(ma4,ma1,ma2,ma3,pc1,pc2,pc3,pc4,mb4);
lxxv
block5: mix port map(ma5,ma6,ma7,ma8,pc1,pc2,pc3,pc4,mb5);block6: mix port map(ma6,ma7,ma8,ma5,pc1,pc2,pc3,pc4,mb6);block7: mix port map(ma7,ma8,ma5,ma6,pc1,pc2,pc3,pc4,mb7);block8: mix port map(ma8,ma5,ma6,ma7,pc1,pc2,pc3,pc4,mb8); block9: mix port map(ma9,ma10,ma11,ma12,pc1,pc2,pc3,pc4,mb9);block10: mix port map(ma10,ma11,ma12,ma9,pc1,pc2,pc3,pc4,mb10);block11: mix port map(ma11,ma12,ma9,ma10,pc1,pc2,pc3,pc4,mb11);block12: mix port map(ma12,ma9,ma10,ma11,pc1,pc2,pc3,pc4,mb12); block13: mix port map(ma13,ma14,ma15,ma16,pc1,pc2,pc3,pc4,mb13);block14: mix port map(ma14,ma15,ma16,ma13,pc1,pc2,pc3,pc4,mb14);block15: mix port map(ma15,ma16,ma13,ma14,pc1,pc2,pc3,pc4,mb15);block16: mix port map(ma16,ma13,ma14,ma15,pc1,pc2,pc3,pc4,mb16);end architecture;
lround.vhd
library ieee;use ieee.std_logic_1164.all;use ieee.std_logic_unsigned.all;use work.int_types.all;use work.key_types.all;
entity lround is port(aa1,aa2,aa3,aa4,aa5,aa6,aa7,aa8,aa9,aa10,aa11,aa12,aa13,aa14,aa15,aa16,k1,k2,k3,k4,k5,k6,k7,k8,k9,k10,k11,k12,k13,k14,k15,k16 : in large_int ; b1,b2,b3,b4,b5,b6,b7,b8,b9,b10,b11,b12,b13,b14,b15,b16:out large_int);end entity;
architecture struct of lround issignaltap1,ap2,ap3,ap4,ap5,ap6,ap7,ap8,ap9,ap10,ap11,ap12,ap13,ap14,ap15,ap16:large_int;signaltap1,tap2,tap3,tap4,tap5,tap6,tap7,tap8,tap9,tap10,tap11,tap12,tap13,tap14,tap15,tap16:large_int;signalttap1,ttap2,ttap3,ttap4,ttap5,ttap6,ttap7,ttap8,ttap9,ttap10,ttap11,ttap12,ttap13,ttap14,ttap15,ttap16:large_int;
component round_sbox
port(aa1,aa2,aa3,aa4,aa5,aa6,aa7,aa8,aa9,aa10,aa11,aa12,aa13,aa14,aa15,aa16:in large_int;
b1,b2,b3,b4,b5,b6,b7,b8,b9,b10,b11,b12,b13,b14,b15,b16:out large_int);end component ; component round_roundkey
port(aa1,aa2,aa3,aa4,aa5,aa6,aa7,aa8,aa9,aa10,aa11,aa12,aa13,aa14,aa15,aa16,k1,k2,k3,k4,k5,k6,k7,k8,k9,k10,k11,k12,k13,k14,k15,k16:in large_int;
b1,b2,b3,b4,b5,b6,b7,b8,b9,b10,b11,b12,b13,b14,b15,b16:out large_int);end component ;
lxxvi
component shiftrow
port(sa1,sa2,sa3,sa4,sa5,sa6,sa7,sa8,sa9,sa10,sa11,sa12,sa13,sa14,sa15,sa16: in large_int; sb1,sb2,sb3,sb4,sb5,sb6,sb7,sb8,sb9,sb10,sb11,sb12,sb13,sb14,sb15,sb16: out large_int);end component;
beginblock1: round_sboxport map(aa1,aa2,aa3,aa4,aa5,aa6,aa7,aa8,aa9,aa10,aa11,aa12,aa13,aa14,aa15,aa16,tap1,tap2,tap3,tap4,tap5,tap6,tap7,tap8,tap9,tap10,tap11,tap12,tap13,tap14,tap15,tap16);block34:shiftrowport map(tap1, tap2, tap3, tap4,tap5, tap6, tap7, tap8, tap9,tap10, tap11, tap12,tap13, tap14,tap15, tap16,ttap1, ttap2, ttap3, ttap4,ttap5, ttap6,ttap7,ttap8, ttap9,ttap10,ttap11, ttap12, ttap13,ttap14,ttap15, ttap16);
block4: round_roundkeyport map(ttap1,ttap2,ttap3,ttap4,ttap5,ttap6,ttap7,ttap8,ttap9,ttap10,ttap11,ttap12,ttap13,ttap14,ttap15,ttap16,k1,k2,k3,k4,k5,k6,k7,k8,k9,k10,k11,k12,k13,k14,k15,k16,b1,b2,b3,b4,b5,b6,b7,b8,b9,b10,b11,b12,b13,b14,b15,b16);end architecture;
round.vhd
library ieee;use ieee.std_logic_1164.all;use ieee.std_logic_unsigned.all;use work.int_types.all;use work.key_types.all;
entity round is port( aa1,aa2,aa3,aa4,aa5,aa6,aa7,aa8,aa9,aa10,aa11,aa12,aa13,aa14,aa15,aa16,k1,k2,k3,k4,k5,k6,k7,k8,k9,k10,k11,k12,k13,k14,k15,k16,pc1,pc2,pc3,pc4 : in large_int ; b1,b2,b3,b4,b5,b6,b7,b8,b9,b10,b11,b12,b13,b14,b15,b16:out large_int);end entity;architecture struct of round is
signaltap1,tap2,tap3,tap4,tap5,tap6,tap7,tap8,tap9,tap10,tap11,tap12,tap13,tap14,tap15,tap16:large_int;signalttap1,ttap2,ttap3,ttap4,ttap5,ttap6,ttap7,ttap8,ttap9,ttap10,ttap11,ttap12,ttap13,ttap14,ttap15,ttap16:large_int;signaltttap1,tttap2,tttap3,tttap4,tttap5,tttap6,tttap7,tttap8,tttap9,tttap10,tttap11,tttap12,tttap13,tttap14,tttap15,tttap16:large_int;component round_sbox
port(aa1,aa2,aa3,aa4,aa5,aa6,aa7,aa8,aa9,aa10,aa11,aa12,aa13,aa14,aa15,aa16:in large_int;
b1,b2,b3,b4,b5,b6,b7,b8,b9,b10,b11,b12,b13,b14,b15,b16:out large_int);end component ;
component round_roundkey
port(aa1,aa2,aa3,aa4,aa5,aa6,aa7,aa8,aa9,aa10,aa11,aa12,aa13,aa14,aa15,aa16,k1,k2,k3,k4,k5,k6,k7,k8,k9,k10,k11,k12,k13,k14,k15,k16:in large_int;
lxxvii
b1,b2,b3,b4,b5,b6,b7,b8,b9,b10,b11,b12,b13,b14,b15,b16:out large_int);end component ;
component shiftrow
port( sa1,sa2,sa3,sa4,sa5,sa6,sa7,sa8,sa9,sa10,sa11,sa12,sa13,sa14,sa15,sa16: in large_int; sb1,sb2,sb3,sb4,sb5,sb6,sb7,sb8,sb9,sb10,sb11,sb12,sb13,sb14,sb15,sb16: out large_int);end component;
component mixcolumn
port(ma1,ma2,ma3,ma4,ma5,ma6,ma7,ma8,ma9,ma10,ma11,ma12,ma13,ma14,ma15,ma16,pc1,pc2,pc3,pc4 :in large_int; mb1,mb2,mb3,mb4,mb5,mb6,mb7,mb8,mb9,mb10,mb11,mb12,mb13,mb14,mb15,mb16 :out large_int);end component; beginblock1: round_sboxport map(aa1,aa2,aa3,aa4,aa5,aa6,aa7,aa8,aa9,aa10,aa11,aa12,aa13,aa14,aa15,aa16,tap1,tap2,tap3,tap4,tap5,tap6,tap7,tap8,tap9,tap10,tap11,tap12,tap13,tap14,tap15,tap16);block34:shiftrowport map(tap1, tap2, tap3, tap4,tap5, tap6, tap7, tap8, tap9,tap10, tap11, tap12,tap13, tap14,tap15, tap16,ttap1, ttap2, ttap3, ttap4,ttap5, ttap6,ttap7,ttap8, ttap9,ttap10,ttap11, ttap12, ttap13,ttap14,ttap15, ttap16);block3: mixcolumnport map(ttap1,ttap2,ttap3,ttap4,ttap5,ttap6,ttap7,ttap8,ttap9,ttap10,ttap11,ttap12,ttap13,ttap14,ttap15,ttap16,pc1,pc2,pc3,pc4,tttap1,tttap2,tttap3,tttap4,tttap5,tttap6,tttap7,tttap8,tttap9,tttap10,tttap11,tttap12,tttap13,tttap14,tttap15,tttap16); block4: round_roundkeyport map(tttap1,tttap2,tttap3,tttap4,tttap5,tttap6,tttap7,tttap8,tttap9,tttap10,tttap11,tttap12,tttap13,tttap14,tttap15,tttap16,k1,k2,k3,k4,k5,k6,k7,k8,k9,k10,k11,k12,k13,k14,k15,k16,b1,b2,b3,b4,b5,b6,b7,b8,b9,b10,b11,b12,b13,b14,b15,b16);end architecture;
keyschedule.vhd
library ieee;use ieee.std_logic_1164.all;use ieee.std_logic_unsigned.all;use work.int_types.all;use work.key_types.all;
entity keyschedule is port ( a1,a2,a3,a4,a5,a6,a7,a8,a9,a10,a11,a12,a13,a14,a15,a16,RCpointer :in large_int; owk: out keyarray;RCp: out large_int);end entity;
architecture behav of keyschedule iscomponent RAM_rbox isport(ma: in large_int; mb :out large_int);end component;component RAM_sbox isport( ma :in large_int ; mb :out large_int);
lxxviii
end component;component roundkey port( a,b :in large_int;c: out large_int);end component;signalaa1,aa2,aa3,aa4,aa5,aa6,aa7,aa8,out1,out2,out3,out4,out5,out6,out7,out8,out9,out10,out11,out12,out13,out14,out15,out0:large_int;
beginblock1: RAM_sbox port map(a14,aa1);block2: roundkey port map(a1,aa1,aa2);block3: RAM_sbox port map(a15,aa3);block4: roundkey port map(a2,aa3,out1);block5: RAM_sbox port map(a16,aa4);block6: roundkey port map(aa4,a3,out2);block7: RAM_sbox port map(a13,aa5);block8: roundkey port map(aa5,a4,out3);block9: RAM_rbox port map(RCpointer,aa6);block10: roundkey port map(aa6,aa2,out0);block11: roundkey port map(a5,out0,out4);block12: roundkey port map(a6,out1,out5);block13: roundkey port map(a7,out2,out6);block14: roundkey port map(a8,out3,out7); block15: roundkey port map(a9,out4,out8);block16: roundkey port map(a10,out5,out9);block17: roundkey port map(a11,out6,out10);block18: roundkey port map(a12,out7,out11); block19: roundkey port map(a13,out8,out12);block20: roundkey port map(a14,out9,out13);block21: roundkey port map(a15,out10,out14);block22: roundkey port map(a16,out11,out15); process(RCpointer,out1,out2,out3,out4,out5,out6,out7,out8,out9,out10,out11,out12,out13,out14,out15,out0) begin RCp<=RCpointer + 1; owk(0)<=out0 ; owk(1)<=out1 ; owk(2)<=out2 ; owk(3)<=out3 ; owk(4)<=out4;
lxxix
owk(5)<=out5 ; owk(6)<=out6 ; owk(7)<=out7 ; owk(8)<=out8 ; owk(9)<=out9 ; owk(10)<=out10 ; owk(11)<=out11 ; owk(12)<=out12 ; owk(13)<=out13 ; owk(14)<=out14; owk(15)<=out15 ; end process; end architecture;
aes.vhd
library ieee;use ieee.std_logic_1164.all;use ieee.std_logic_unsigned.all;use work.int_types.all;use work.key_types.all;
entity aes is port(aa1,aa2,aa3,aa4,aa5,aa6,aa7,aa8,aa9,aa10,aa11,aa12,aa13,aa14,aa15,aa16,k1,k2,k3,k4,k5,k6,k7,k8,k9,k10,k11,k12,k13,k14,k15,k16,RCpointer,pc1,pc2,pc3,pc4 : in large_int ; b1,b2,b3,b4,b5,b6,b7,b8,b9,b10,b11,b12,b13,b14,b15,b16:out large_int);end entity;
architecture struct of aes issignalap1,ap2,ap3,ap4,ap5,ap6,ap7,ap8,ap9,ap10,ap11,ap12,ap13,ap14,ap15,ap16:large_int;signaltap1,tap2,tap3,tap4,tap5,tap6,tap7,tap8,tap9,tap10,tap11,tap12,tap13,tap14,tap15,tap16:large_int;
signal sap1, sap2, sap3,sap4,sap5,sap6,sap7,sap8,sap9,sap10,sap11,sap12,sap13,sap14,sap15,sap16:large_int;signalaap1,aap2,aap3,aap4,aap5,aap6,aap7,aap8,aap9,aap10,aap11,aap12,aap13,aap14,aap15,aap16:large_int;signal bap1,bap2,bap3,bap4,bap5,bap6,bap7,bap8,bap9,bap10,bap11,bap12,bap13,bap14,bap15,bap16:large_int;signal cap1,cap2,cap3,cap4,cap5,cap6,cap7,cap8,cap9,cap10,cap11,cap12,cap13,cap14,cap15,cap16:large_int;signal dap1,dap2,dap3,dap4,dap5,dap6,dap7,dap8,dap9,dap10,dap11,dap12,dap13,dap14,dap15,dap16:large_int;signal fap1,fap2,fap3,fap4,fap5,fap6,fap7,fap8,fap9,fap10,fap11,fap12,fap13,fap14,fap15,fap16:large_int;signal gap1,gap2,gap3,gap4,gap5,gap6,gap7,gap8,gap9,gap10,gap11,gap12,gap13,gap14,gap15,gap16:large_int;signal hap1,hap2,hap3,hap4,hap5,hap6,hap7,hap8,hap9,hap10,hap11,hap12,hap13,hap14,hap15,hap16:large_int;signal wpk1,wpk2,wpk3,wpk4,wpk5,wpk6,wpk7,wpk8,wpk9,wpk10: keyarray;signal RC2,RC3,RC4,RC5,RC6,RC7,RC8,RC9,RC10,RC11:large_int;
lxxx
component keyscheduleport( a1,a2,a3,a4,a5,a6,a7,a8,a9,a10,a11,a12,a13,a14,a15,a16,RCpointer : in large_int ;owk : out keyarray ; RCp: out large_int);end component;component round port( aa1,aa2,aa3,aa4,aa5,aa6,aa7,aa8,aa9,aa10,aa11,aa12,aa13,aa14,aa15,aa16,k1,k2,k3,k4,k5,k6,k7,k8,k9,k10,k11,k12,k13,k14,k15,k16,pc1,pc2,pc3,pc4 : in large_int ; b1,b2,b3,b4,b5,b6,b7,b8,b9,b10,b11,b12,b13,b14,b15,b16:out large_int);end component;component roundkey port( a,b :in large_int;c: out large_int);end component;component lround port( aa1,aa2,aa3,aa4,aa5,aa6,aa7,aa8,aa9,aa10,aa11,aa12,aa13,aa14,aa15,aa16,k1,k2,k3,k4,k5,k6,k7,k8,k9,k10,k11,k12,k13,k14,k15,k16 : in large_int ; b1,b2,b3,b4,b5,b6,b7,b8,b9,b10,b11,b12,b13,b14,b15,b16:out large_int);end component;
beginblock01: keyscheduleport map(k1, k2,k3,k4,k5,k6,k7,k8,k9,k10,k11,k12,k13,k14,k15,k16,RCpointer,wpk1,RC2);block02: keyscheduleport map(wpk1(0),wpk1(1),wpk1(2),wpk1(3),wpk1(4),wpk1(5),wpk1(6),wpk1(7),wpk1(8),wpk1(9),wpk1(10),wpk1(11),wpk1(12),wpk1(13),wpk1(14),wpk1(15),RC2,wpk2,RC3);block03:keyscheduleport map(wpk2(0),wpk2(1),wpk2(2),wpk2(3),wpk2(4),wpk2(5),wpk2(6),wpk2(7),wpk2(8),wpk2(9),wpk2(10),wpk2(11),wpk2(12),wpk2(13),wpk2(14),wpk2(15),RC3,wpk3,RC4);block04:keyscheduleport map(wpk3(0),wpk3(1),wpk3(2),wpk3(3),wpk3(4),wpk3(5),wpk3(6),wpk3(7),wpk3(8),wpk3(9),wpk3(10),wpk3(11),wpk3(12),wpk3(13),wpk3(14),wpk3(15),RC4,wpk4,RC5);block05:keyscheduleport map(wpk4(0),wpk4(1),wpk4(2),wpk4(3),wpk4(4),wpk4(5),wpk4(6),wpk4(7),wpk4(8),wpk4(9),wpk4(10),wpk4(11),wpk4(12),wpk4(13),wpk4(14),wpk4(15),RC5,wpk5,RC6);block06:keyscheduleport map(wpk5(0),wpk5(1),wpk5(2),wpk5(3),wpk5(4),wpk5(5),wpk5(6),wpk5(7),wpk5(8),wpk5(9),wpk5(10),wpk5(11),wpk5(12),wpk5(13),wpk5(14),wpk5(15),RC6,wpk6,RC7);block07:keyscheduleport map(wpk6(0),wpk6(1),wpk6(2),wpk6(3),wpk6(4),wpk6(5),wpk6(6),wpk6(7),wpk6(8),wpk6(9),wpk6(10),wpk6(11),wpk6(12),wpk6(13),wpk6(14),wpk6(15),RC7,wpk7,RC8);block08:keyscheduleport map(wpk7(0),wpk7(1),wpk7(2),wpk7(3),wpk7(4),wpk7(5),wpk7(6),wpk7(7),wpk7(8),wpk7(9),wpk7(10),wpk7(11),wpk7(12),wpk7(13),wpk7(14),wpk7(15),RC8,wpk8,RC9);block09:keyscheduleport map(wpk8(0),wpk8(1),wpk8(2),wpk8(3),wpk8(4),wpk8(5),wpk8(6),wpk8(7),wpk8(8),wpk8(9),wpk8(10),wpk8(11),wpk8(12),wpk8(13),wpk8(14),wpk8(15),RC9,wpk9,RC10);block010:keyschedule
lxxxi
port map(wpk9(0),wpk9(1),wpk9(2),wpk9(3),wpk9(4),wpk9(5),wpk9(6),wpk9(7),wpk9(8),wpk9(9),wpk9(10),wpk9(11),wpk9(12),wpk9(13),wpk9(14),wpk9(15),RC10,wpk10,RC11);
block2: roundkeyport map(aa1,k1, ap1);block3: roundkeyport map(aa2, k2, ap2);block4: roundkeyport map(aa3, k3, ap3);block5:roundkeyport map(aa4,k4, ap4);block6: roundkeyport map(aa5,k5,ap5);block7: roundkeyport map(aa6,k6,ap6);block8: roundkeyport map(aa7, k7,ap7);block9: roundkeyport map(aa8, k8,ap8);block10: roundkeyport map(aa9, k9,ap9);block11: roundkeyport map(aa10, k10,ap10);block12: roundkeyport map(aa11,k11, ap11);block13: roundkeyport map(aa12,k12,ap12);block14: roundkeyport map(aa13, k13,ap13);block15: roundkeyport map(aa14, k14,ap14);block16: roundkeyport map(aa15, k15,ap15);block17: roundkeyport map(aa16, k16,ap16);
block18: roundport map( ap1,ap2,ap3,ap4,ap5,ap6,ap7,ap8,ap9,ap10,ap11,ap12,ap13,ap14,ap15,ap16,wpk1(0),wpk1(1),wpk1(2),wpk1(3),wpk1(4),wpk1(5),wpk1(6),wpk1(7),wpk1(8),wpk1(9),wpk1(10),wpk1(11),wpk1(12),wpk1(13),wpk1(14),wpk1(15),pc1,pc2,pc3,pc4,tap1,tap2,tap3,tap4,tap5,tap6,tap7,tap8,tap9,tap10,tap11,tap12,tap13,tap14,tap15,tap16);
block19: roundport map( tap1,tap2,tap3,tap4,tap5,tap6,tap7,tap8,tap9,tap10,tap11,tap12,tap13,tap14,tap15,tap16,wpk2(0),wpk2(1),wpk2(2),wpk2(3),wpk2(4),wpk2(5),wpk2(6),wpk2(7),wpk2(8),wpk2(9),wpk2(10),wpk2(11),wpk2(12),wpk2(13),wpk2(14),wpk2(15),pc1,pc2,pc3,pc4,sap1,sap2,sap3,sap4,sap5,sap6,sap7,sap8,sap9,sap10,sap11,sap12,sap13,sap14,sap15,sap16);
block20: roundport map( sap1,sap2,sap3,sap4,sap5,sap6,sap7,sap8,sap9,sap10,sap11,sap12,sap13,sap14,sap15,sap16,wpk3(0),wpk3(1),wpk3(2),wpk3(3),wpk3(4),wpk3(5),wpk3(6),wpk3(7),wpk3(8),wpk3(9),wpk3(10),wpk3(11),wpk3(12),wpk3(13),wpk3(14),wpk3(15),pc1,pc2,pc3,pc4,aap1,aap2,aap3,aap4,aap5,aap6,aap7,aap8,aap9,aap10,aap11,aap12,aap13,aap14,aap15,aap16);
block21: roundport map( aap1,aap2,aap3,aap4,aap5,aap6,aap7,aap8,aap9,aap10,aap11,aap12,aap13,aap14,
lxxxii
aap15,aap16,wpk4(0),wpk4(1),wpk4(2),wpk4(3),wpk4(4),wpk4(5),wpk4(6),wpk4(7),wpk4(8),wpk4(9),wpk4(10),wpk4(11),wpk4(12),wpk4(13),wpk4(14),wpk4(15),pc1,pc2,pc3,pc4,bap1,bap2,bap3,bap4,bap5,bap6,bap7,bap8,bap9,bap10,bap11,bap12,bap13,bap14,bap15,bap16);
block22: roundport map( bap1,bap2,bap3,bap4,bap5,bap6,bap7,bap8,bap9,bap10,bap11,bap12,bap13,bap14,bap15,bap16,wpk5(0),wpk5(1),wpk5(2),wpk5(3),wpk5(4),wpk5(5),wpk5(6),wpk5(7),wpk5(8),wpk5(9),wpk5(10),wpk5(11),wpk5(12),wpk5(13),wpk5(14),wpk5(15),pc1,pc2,pc3,pc4,cap1,cap2,cap3,cap4,cap5,cap6,cap7,cap8,cap9,cap10,cap11,cap12,cap13,cap14,cap15,cap16);
block23: roundport map( cap1,cap2,cap3,cap4,cap5,cap6,cap7,cap8,cap9,cap10,cap11,cap12,cap13,cap14,cap15,cap16,wpk6(0),wpk6(1),wpk6(2),wpk6(3),wpk6(4),wpk6(5),wpk6(6),wpk6(7),wpk6(8),wpk6(9),wpk6(10),wpk6(11),wpk6(12),wpk6(13),wpk6(14),wpk6(15),pc1,pc2,pc3,pc4,dap1,dap2,dap3,dap4,dap5,dap6,dap7,dap8,dap9,dap10,dap11,dap12,dap13,dap14,dap15,dap16);
block24: roundport map( dap1,dap2,dap3,dap4,dap5,dap6,dap7,dap8,dap9,dap10,dap11,dap12,dap13,dap14,dap15,dap16,wpk7(0),wpk7(1),wpk7(2),wpk7(3),wpk7(4),wpk7(5),wpk7(6),wpk7(7),wpk7(8),wpk7(9),wpk7(10),wpk7(11),wpk7(12),wpk7(13),wpk7(14),wpk7(15),pc1,pc2,pc3,pc4,fap1,fap2,fap3,fap4,fap5,fap6,fap7,fap8,fap9,fap10,fap11,fap12,fap13,fap14,fap15,fap16);
block25: roundport map( fap1,fap2,fap3,fap4,fap5,fap6,fap7,fap8,fap9,fap10,fap11,fap12,fap13,fap14,fap15,fap16,wpk8(0),wpk8(1),wpk8(2),wpk8(3),wpk8(4),wpk8(5),wpk8(6),wpk8(7),wpk8(8),wpk8(9),wpk8(10),wpk8(11),wpk8(12),wpk8(13),wpk8(14),wpk8(15),pc1,pc2,pc3,pc4,gap1,gap2,gap3,gap4,gap5,gap6,gap7,gap8,gap9,gap10,gap11,gap12,gap13,gap14,gap15,gap16);
block26: roundport map(gap1,gap2,gap3,gap4,gap5,gap6,gap7,gap8,gap9,gap10,gap11,gap12,gap13,gap14,gap15,gap16,wpk9(0),wpk9(1),wpk9(2),wpk9(3),wpk9(4),wpk9(5),wpk9(6),wpk9(7),wpk9(8),wpk9(9),wpk9(10),wpk9(11),wpk9(12),wpk9(13),wpk9(14),wpk9(15),pc1,pc2,pc3,pc4,hap1,hap2,hap3,hap4,hap5,hap6,hap7,hap8,hap9,hap10,hap11,hap12,hap13,hap14,hap15,hap16);
block27: lroundport map( hap1,hap2,hap3,hap4,hap5,hap6,hap7,hap8,hap9,hap10,hap11,hap12,hap13,hap14,hap15,hap16,wpk10(0),wpk10(1),wpk10(2),wpk10(3),wpk10(4),wpk10(5),wpk10(6),wpk10(7),wpk10(8),wpk10(9),wpk10(10),wpk10(11),wpk10(12),wpk10(13),wpk10(14),wpk10(15),b1,b2,b3,b4,b5,b6,b7,b8,b9,b10,b11,b12,b13,b14,b15,b16);end architecture;
Dsbox_ram.vhd
library ieee;use ieee.std_logic_1164.all;use ieee.std_logic_unsigned.all;use work.int_types.all;entity RAM_dsbox isport( ma :in large_int ; mb :out large_int);
lxxxiii
end entity;
architecture behav of RAM_dsbox isbeginprocess (ma)type box is array (0 to 255 ) of large_int; constant s:box:=( 82, 9, 106, 213, 48, 54, 165, 56, 191, 64, 163, 158, 129, 243, 215, 251, 124, 227, 57, 130, 155, 47, 255, 135, 52, 142, 67, 68, 196, 222, 233, 203, 84, 123, 148, 50, 166, 194, 35, 61, 238, 76, 149, 11, 66, 250, 195, 78, 8, 46, 161, 102, 40, 217, 36, 178, 118, 91, 162, 73, 109, 139, 209, 37, 114, 248, 246, 100, 134, 104, 152, 22, 212, 164, 92, 204, 93, 101, 182, 146, 108, 112, 72, 80, 253, 237, 185, 218, 94, 21, 70, 87, 167, 141, 157, 132, 144, 216, 171, 0, 140, 188, 211, 10, 247, 228, 88, 5, 184, 179, 69, 6, 208, 44, 30, 143, 202, 63, 15, 2, 193, 175, 189, 3, 1, 19, 138, 107, 58, 145, 17, 65, 79, 103, 220, 234, 151, 242, 207, 206, 240, 180, 230, 115, 150, 172, 116, 34, 231, 173, 53, 133, 226, 249, 55, 232, 28, 117, 223, 110, 71, 241, 26, 113, 29, 41, 197, 137, 111, 183, 98, 14, 170, 24, 190, 27, 252, 86, 62, 75, 198, 210, 121, 32, 154, 219, 192, 254, 120, 205, 90, 244, 31, 221, 168, 51, 136, 7, 199, 49, 177, 18, 16, 89, 39, 128, 236, 95, 96, 81, 127, 169, 25, 181, 74, 13, 45, 229, 122, 159, 147, 201, 156, 239, 160, 224, 59, 77, 174, 42, 245, 176, 200, 235, 187, 60, 131, 83, 153, 97, 23, 43, 4, 126, 186, 119, 214, 38, 225, 105, 20, 99, 85, 33, 12, 125, );beginmb<=s(ma);end process;end architecture;
dshiftrow.vhd
library ieee;use ieee.std_logic_1164.all;use ieee.std_logic_unsigned.all;use work.int_types.all;
entity dshiftrow is
port( sa1,sa2,sa3,sa4,sa5,sa6,sa7,sa8,sa9,sa10,sa11,sa12,sa13,sa14,sa15,sa16: in large_int; sb1,sb2,sb3,sb4,sb5,sb6,sb7,sb8,sb9,sb10,sb11,sb12,sb13,sb14,sb15,sb16: out large_int);end entity;architecture behav of dshiftrow isbeginstorage: process(sa1,sa2,sa3,sa4,sa5,sa6,sa7,sa8,sa9,sa10,sa11,sa12,sa13,sa14,sa15,sa16)
beginsb1<=sa1;sb2<=sa14;sb3<=sa11;sb4<=sa8;sb5<=sa5;sb6<=sa2;sb7<=sa15;sb8<=sa12;sb9<=sa9;sb10<=sa6;sb11<=sa3;sb12<=sa16;sb13<=sa13;
lxxxiv
sb14<=sa10;sb15<=sa7;sb16<=sa4;
end process;end architecture;
round_dsbox.vhd
library ieee;use ieee.std_logic_1164.all;use ieee.std_logic_unsigned.all;use work.int_types.all;use work.key_types.all;
entity round_dsbox is port( aa1,aa2,aa3,aa4,aa5,aa6,aa7,aa8,aa9,aa10,aa11,aa12,aa13,aa14,aa15,aa16 : in large_int;b1,b2,b3,b4,b5,b6,b7,b8,b9,b10,b11,b12,b13,b14,b15,b16:out large_int);end entity;architecture struct of round_dsbox is
component RAM_dsbox port( ma:in large_int; mb:out large_int);end component ;
beginblock18:RAM_dsboxport map(aa1,b1);block19: RAM_dsboxport map(aa2, b2);block20:RAM_dsboxport map(aa3, b3);block21:RAM_dsboxport map(aa4, b4);block22:RAM_dsboxport map(aa5, b5);block23:RAM_dsboxport map(aa6, b6);block24:RAM_dsboxport map(aa7, b7);block25:RAM_dsboxport map(aa8, b8);block26:RAM_dsboxport map(aa9, b9);block27:RAM_dsboxport map(aa10, b10);block28:RAM_dsboxport map(aa11, b11);block29:RAM_dsboxport map(aa12, b12);block30:RAM_dsboxport map(aa13, b13);block31:RAM_dsboxport map(aa14, b14);block32:RAM_dsboxport map(aa15, b15);block33:RAM_dsboxport map(aa16, b16);
lxxxv
end architecture;
dmix.vhd
library ieee;use ieee.std_logic_1164.all;use ieee.std_logic_unsigned.all;use work.int_types.all;
entity dmix is port( ma1,ma2,ma3,ma4,pc1,pc2,pc3,pc4 :in large_int; mb1:out large_int);end entity;architecture behav of dmix is
component multiplyport( ma1,ma2 :in large_int;mb1: out large_int);end component;
component roundkey port( a,b :in large_int;c: out large_int);end component;
signal aa1,aa2,aa3,aa4,aa5,aa6 :large_int;begin
block1: multiply port map(ma1,pc1,aa1);block2: multiply port map(ma2,pc2,aa2);block3: roundkey port map(aa1,aa2,aa3);block4: multiply port map(ma3,pc3,aa4);block5:multiply port map(ma4,pc4,aa5);
block6: roundkey port map(aa4,aa5,aa6);block7: roundkey port map(aa6,aa3,mb1);end architecture;
invmixcolumn.vhd
library ieee;use ieee.std_logic_1164.all;use ieee.std_logic_unsigned.all;use work.int_types.all;
entity invmixcolumn is
port( ma1,ma2,ma3,ma4,ma5,ma6,ma7,ma8,ma9,ma10,ma11,ma12,ma13,ma14,ma15,ma16,pc1,pc2,pc3,pc4 :in large_int; mb1,mb2,mb3,mb4,mb5,mb6,mb7,mb8,mb9,mb10,mb11,mb12,mb13,mb14,mb15,mb16 :out large_int);
lxxxvi
end entity; architecture behav of invmixcolumn is
component dmixport( ma1,ma2,ma3,ma4,pc1,pc2,pc3,pc4 :in large_int;mb1:out large_int);end component;
beginblock1: dmix port map(ma1,ma2,ma3,ma4,pc1,pc2,pc3,pc4,mb1);block2: dmix port map(ma2,ma3,ma4,ma1,pc1,pc2,pc3,pc4,mb2);block3: dmix port map(ma3,ma4,ma1,ma2,pc1,pc2,pc3,pc4,mb3);block4: dmix port map(ma4,ma1,ma2,ma3,pc1,pc2,pc3,pc4,mb4); block5: dmix port map(ma5,ma6,ma7,ma8,pc1,pc2,pc3,pc4,mb5);block6: dmix port map(ma6,ma7,ma8,ma5,pc1,pc2,pc3,pc4,mb6);block7: dmix port map(ma7,ma8,ma5,ma6,pc1,pc2,pc3,pc4,mb7);block8: dmix port map(ma8,ma5,ma6,ma7,pc1,pc2,pc3,pc4,mb8); block9: dmix port map(ma9,ma10,ma11,ma12,pc1,pc2,pc3,pc4,mb9);block10: dmix port map(ma10,ma11,ma12,ma9,pc1,pc2,pc3,pc4,mb10);block11: dmix port map(ma11,ma12,ma9,ma10,pc1,pc2,pc3,pc4,mb11);block12: dmix port map(ma12,ma9,ma10,ma11,pc1,pc2,pc3,pc4,mb12); block13: dmix port map(ma13,ma14,ma15,ma16,pc1,pc2,pc3,pc4,mb13);block14: dmix port map(ma14,ma15,ma16,ma13,pc1,pc2,pc3,pc4,mb14);block15: dmix port map(ma15,ma16,ma13,ma14,pc1,pc2,pc3,pc4,mb15);block16: dmix port map(ma16,ma13,ma14,ma15,pc1,pc2,pc3,pc4,mb16); end architecture;
fround.vhd
library ieee;use ieee.std_logic_1164.all;use ieee.std_logic_unsigned.all;use work.int_types.all;use work.key_types.all;
entity fround is port( aa1,aa2,aa3,aa4,aa5,aa6,aa7,aa8,aa9,aa10,aa11,aa12,aa13,aa14,aa15,aa16,k1,k2,k3,k4,k5,k6,k7,k8,k9,k10,k11,k12,k13,k14,k15,k16
lxxxvii
: in large_int ; b1,b2,b3,b4,b5,b6,b7,b8,b9,b10,b11,b12,b13,b14,b15,b16:out large_int);end entity;architecture struct of fround is
signal tap1,tap2,tap3,tap4,tap5,tap6,tap7,tap8,tap9,tap10,tap11,tap12,tap13,tap14,tap15,tap16:large_int;signal ttap1,ttap2,ttap3,ttap4,ttap5,ttap6,ttap7,ttap8,ttap9,ttap10,ttap11,ttap12,ttap13,ttap14,ttap15,ttap16:large_int;signal tttap1,tttap2,tttap3,tttap4,tttap5,tttap6,tttap7,tttap8,tttap9,tttap10,tttap11,tttap12,tttap13,tttap14,tttap15,tttap16:large_int;
component round_dsbox
port(aa1,aa2,aa3,aa4,aa5,aa6,aa7,aa8,aa9,aa10,aa11,aa12,aa13,aa14,aa15,aa16:in large_int;
b1,b2,b3,b4,b5,b6,b7,b8,b9,b10,b11,b12,b13,b14,b15,b16:out large_int);end component ;
component round_roundkey
port(aa1,aa2,aa3,aa4,aa5,aa6,aa7,aa8,aa9,aa10,aa11,aa12,aa13,aa14,aa15,aa16,k1,k2,k3,k4,k5,k6,k7,k8,k9,k10,k11,k12,k13,k14,k15,k16:in large_int;
b1,b2,b3,b4,b5,b6,b7,b8,b9,b10,b11,b12,b13,b14,b15,b16:out large_int);end component ;
component dshiftrow
port( sa1,sa2,sa3,sa4,sa5,sa6,sa7,sa8,sa9,sa10,sa11,sa12,sa13,sa14,sa15,sa16: in large_int; sb1,sb2,sb3,sb4,sb5,sb6,sb7,sb8,sb9,sb10,sb11,sb12,sb13,sb14,sb15,sb16: out large_int);end component;
component invmixcolumn
port( ma1,ma2,ma3,ma4,ma5,ma6,ma7,ma8,ma9,ma10,ma11,ma12,ma13,ma14,ma15,ma16,pc1,pc2,pc3,pc4 :in large_int; mb1,mb2,mb3,mb4,mb5,mb6,mb7,mb8,mb9,mb10,mb11,mb12,mb13,mb14,mb15,mb16 :out large_int);end component;
beginblock4: round_dsboxport map(tttap1,tttap2,tttap3,tttap4,tttap5,tttap6,tttap7,tttap8,tttap9,tttap10,tttap11,tttap12,tttap13,tttap14,tttap15,tttap16,tap1,tap2,tap3,tap4,tap5,tap6,tap7,tap8,tap9,tap10,tap11,tap12,tap13,tap14,tap15,tap16);
block3:dshiftrowport map(tap1, tap2, tap3, tap4,tap5, tap6, tap7, tap8, tap9,tap10, tap11, tap12,tap13, tap14,tap15, tap16,b1,b2,b3,b4,b5,b6,b7,b8,b9,b10,b11,b12,b13,b14,b15,b16);
block1: round_roundkeyport map(aa1,aa2,aa3,aa4,aa5,aa6,aa7,aa8,aa9,aa10,aa11,aa12,aa13,aa14,aa15,aa16,k1,k2
lxxxviii
,k3,k4,k5,k6,k7,k8,k9,k10,k11,k12,k13,k14,k15,k16,tttap1,tttap2,tttap3,tttap4,tttap5,tttap6,tttap7,tttap8,tttap9,tttap10,tttap11,tttap12,tttap13,tttap14,tttap15,tttap16);
end architecture;
dround.vhd
library ieee;use ieee.std_logic_1164.all;use ieee.std_logic_unsigned.all;use work.int_types.all;use work.key_types.all;
entity dround is port( aa1,aa2,aa3,aa4,aa5,aa6,aa7,aa8,aa9,aa10,aa11,aa12,aa13,aa14,aa15,aa16,k1,k2,k3,k4,k5,k6,k7,k8,k9,k10,k11,k12,k13,k14,k15,k16,pc1,pc2,pc3,pc4 : in large_int ; b1,b2,b3,b4,b5,b6,b7,b8,b9,b10,b11,b12,b13,b14,b15,b16:out large_int);end entity;architecture struct of dround is
signal tap1,tap2,tap3,tap4,tap5,tap6,tap7,tap8,tap9,tap10,tap11,tap12,tap13,tap14,tap15,tap16:large_int;signal ttap1,ttap2,ttap3,ttap4,ttap5,ttap6,ttap7,ttap8,ttap9,ttap10,ttap11,ttap12,ttap13,ttap14,ttap15,ttap16:large_int;signal tttap1,tttap2,tttap3,tttap4,tttap5,tttap6,tttap7,tttap8,tttap9,tttap10,tttap11,tttap12,tttap13,tttap14,tttap15,tttap16:large_int;
component round_dsbox
port(aa1,aa2,aa3,aa4,aa5,aa6,aa7,aa8,aa9,aa10,aa11,aa12,aa13,aa14,aa15,aa16:in large_int;
b1,b2,b3,b4,b5,b6,b7,b8,b9,b10,b11,b12,b13,b14,b15,b16:out large_int);end component ;
component round_roundkey
port(aa1,aa2,aa3,aa4,aa5,aa6,aa7,aa8,aa9,aa10,aa11,aa12,aa13,aa14,aa15,aa16,k1,k2,k3,k4,k5,k6,k7,k8,k9,k10,k11,k12,k13,k14,k15,k16:in large_int;
b1,b2,b3,b4,b5,b6,b7,b8,b9,b10,b11,b12,b13,b14,b15,b16:out large_int);end component ;
component dshiftrow
port( sa1,sa2,sa3,sa4,sa5,sa6,sa7,sa8,sa9,sa10,sa11,sa12,sa13,sa14,sa15,sa16: in large_int; sb1,sb2,sb3,sb4,sb5,sb6,sb7,sb8,sb9,sb10,sb11,sb12,sb13,sb14,sb15,sb16: out large_int);end component;
component invmixcolumn
port( ma1,ma2,ma3,ma4,ma5,ma6,ma7,ma8,ma9,ma10,ma11,ma12,ma13,ma14,ma15,ma16,pc1,pc2,pc3,pc4 :in large_int;
lxxxix
mb1,mb2,mb3,mb4,mb5,mb6,mb7,mb8,mb9,mb10,mb11,mb12,mb13,mb14,mb15,mb16 :out large_int);end component;
beginblock4: round_dsboxport map(ttap1, ttap2, ttap3, ttap4,ttap5, ttap6,ttap7,ttap8, ttap9,ttap10,ttap11, ttap12, ttap13,ttap14,ttap15, ttap16,tap1,tap2,tap3,tap4,tap5,tap6,tap7,tap8,tap9,tap10,tap11,tap12,tap13,tap14,tap15,tap16);
block3:dshiftrowport map(tap1, tap2, tap3, tap4,tap5, tap6, tap7, tap8, tap9,tap10, tap11, tap12,tap13, tap14,tap15, tap16,b1,b2,b3,b4,b5,b6,b7,b8,b9,b10,b11,b12,b13,b14,b15,b16);block2: invmixcolumnport map(tttap1,tttap2,tttap3,tttap4,tttap5,tttap6,tttap7,tttap8,tttap9,tttap10,tttap11,tttap12,tttap13,tttap14,tttap15,tttap16,pc1,pc2,pc3,pc4,ttap1,ttap2,ttap3,ttap4,ttap5,ttap6,ttap7,ttap8,ttap9,ttap10,ttap11,ttap12,ttap13,ttap14,ttap15,ttap16);
block1: round_roundkeyport map(aa1,aa2,aa3,aa4,aa5,aa6,aa7,aa8,aa9,aa10,aa11,aa12,aa13,aa14,aa15,aa16,k1,k2,k3,k4,k5,k6,k7,k8,k9,k10,k11,k12,k13,k14,k15,k16,tttap1,tttap2,tttap3,tttap4,tttap5,tttap6,tttap7,tttap8,tttap9,tttap10,tttap11,tttap12,tttap13,tttap14,tttap15,tttap16);end architecture;
daes.vhd
library ieee;use ieee.std_logic_1164.all;use ieee.std_logic_unsigned.all;use work.int_types.all;use work.key_types.all;
entity daes is port( aa1,aa2,aa3,aa4,aa5,aa6,aa7,aa8,aa9,aa10,aa11,aa12,aa13,aa14,aa15,aa16,k1,k2,k3,k4,k5,k6,k7,k8,k9,k10,k11,k12,k13,k14,k15,k16,RCpointer,pc1,pc2,pc3,pc4 : in large_int ; b1,b2,b3,b4,b5,b6,b7,b8,b9,b10,b11,b12,b13,b14,b15,b16:out large_int);end entity;architecture struct of daes issignal ap1,ap2,ap3,ap4,ap5,ap6,ap7,ap8,ap9,ap10,ap11,ap12,ap13,ap14,ap15,ap16:large_int;signal tap1,tap2,tap3,tap4,tap5,tap6,tap7,tap8,tap9,tap10,tap11,tap12,tap13,tap14,tap15,tap16:large_int;
signal sap1,sap2,sap3,sap4,sap5,sap6,sap7,sap8,sap9,sap10,sap11,sap12,sap13,sap14,sap15,sap16:large_int;signal aap1,aap2,aap3,aap4,aap5,aap6,aap7,aap8,aap9,aap10,aap11,aap12,aap13,aap14,aap15,aap16:large_int;
xc
signal bap1,bap2,bap3,bap4,bap5,bap6,bap7,bap8,bap9,bap10,bap11,bap12,bap13,bap14,bap15,bap16:large_int;signal cap1,cap2,cap3,cap4,cap5,cap6,cap7,cap8,cap9,cap10,cap11,cap12,cap13,cap14,cap15,cap16:large_int;signal dap1,dap2,dap3,dap4,dap5,dap6,dap7,dap8,dap9,dap10,dap11,dap12,dap13,dap14,dap15,dap16:large_int;signal fap1,fap2,fap3,fap4,fap5,fap6,fap7,fap8,fap9,fap10,fap11,fap12,fap13,fap14,fap15,fap16:large_int;signal gap1,gap2,gap3,gap4,gap5,gap6,gap7,gap8,gap9,gap10,gap11,gap12,gap13,gap14,gap15,gap16:large_int;signal hap1,hap2,hap3,hap4,hap5,hap6,hap7,hap8,hap9,hap10,hap11,hap12,hap13,hap14,hap15,hap16:large_int;signal wpk1,wpk2,wpk3,wpk4,wpk5,wpk6,wpk7,wpk8,wpk9,wpk10: keyarray;signal RC2,RC3,RC4,RC5,RC6,RC7,RC8,RC9,RC10,RC11:large_int;
component keyscheduleport( a1,a2,a3,a4,a5,a6,a7,a8,a9,a10,a11,a12,a13,a14,a15,a16,RCpointer : in large_int ;owk : out keyarray ; RCp: out large_int);end component;
component dround port( aa1,aa2,aa3,aa4,aa5,aa6,aa7,aa8,aa9,aa10,aa11,aa12,aa13,aa14,aa15,aa16,k1,k2,k3,k4,k5,k6,k7,k8,k9,k10,k11,k12,k13,k14,k15,k16,pc1,pc2,pc3,pc4 : in large_int ; b1,b2,b3,b4,b5,b6,b7,b8,b9,b10,b11,b12,b13,b14,b15,b16:out large_int);end component;
component roundkey port( a,b :in large_int;c: out large_int);end component;
component fround port( aa1,aa2,aa3,aa4,aa5,aa6,aa7,aa8,aa9,aa10,aa11,aa12,aa13,aa14,aa15,aa16,k1,k2,k3,k4,k5,k6,k7,k8,k9,k10,k11,k12,k13,k14,k15,k16 : in large_int ; b1,b2,b3,b4,b5,b6,b7,b8,b9,b10,b11,b12,b13,b14,b15,b16:out large_int);end component;
begin block01: keyscheduleport map(k1, k2,k3,k4,k5,k6,k7,k8,k9,k10,k11,k12,k13,k14,k15,k16,RCpointer,wpk1,RC2);block02: keyscheduleport map(wpk1(0),wpk1(1),wpk1(2),wpk1(3),wpk1(4),wpk1(5),wpk1(6),wpk1(7),wpk1(8),wpk1(9),wpk1(10),wpk1(11),wpk1(12),wpk1(13),wpk1(14),wpk1(15),RC2,wpk2,RC3);block03:keyscheduleport map(wpk2(0),wpk2(1),wpk2(2),wpk2(3),wpk2(4),wpk2(5),wpk2(6),wpk2(7),wpk2(8),wpk2(9),wpk2(10),wpk2(11),wpk2(12),wpk2(13),wpk2(14),wpk2(15),RC3,wpk3,RC4);block04:keyschedule
xci
port map(wpk3(0),wpk3(1),wpk3(2),wpk3(3),wpk3(4),wpk3(5),wpk3(6),wpk3(7),wpk3(8),wpk3(9),wpk3(10),wpk3(11),wpk3(12),wpk3(13),wpk3(14),wpk3(15),RC4,wpk4,RC5);block05:keyscheduleport map(wpk4(0),wpk4(1),wpk4(2),wpk4(3),wpk4(4),wpk4(5),wpk4(6),wpk4(7),wpk4(8),wpk4(9),wpk4(10),wpk4(11),wpk4(12),wpk4(13),wpk4(14),wpk4(15),RC5,wpk5,RC6);block06:keyscheduleport map(wpk5(0),wpk5(1),wpk5(2),wpk5(3),wpk5(4),wpk5(5),wpk5(6),wpk5(7),wpk5(8),wpk5(9),wpk5(10),wpk5(11),wpk5(12),wpk5(13),wpk5(14),wpk5(15),RC6,wpk6,RC7);block07:keyscheduleport map(wpk6(0),wpk6(1),wpk6(2),wpk6(3),wpk6(4),wpk6(5),wpk6(6),wpk6(7),wpk6(8),wpk6(9),wpk6(10),wpk6(11),wpk6(12),wpk6(13),wpk6(14),wpk6(15),RC7,wpk7,RC8);block08:keyscheduleport map(wpk7(0),wpk7(1),wpk7(2),wpk7(3),wpk7(4),wpk7(5),wpk7(6),wpk7(7),wpk7(8),wpk7(9),wpk7(10),wpk7(11),wpk7(12),wpk7(13),wpk7(14),wpk7(15),RC8,wpk8,RC9);block09:keyscheduleport map(wpk8(0),wpk8(1),wpk8(2),wpk8(3),wpk8(4),wpk8(5),wpk8(6),wpk8(7),wpk8(8),wpk8(9),wpk8(10),wpk8(11),wpk8(12),wpk8(13),wpk8(14),wpk8(15),RC9,wpk9,RC10);block010:keyscheduleport map(wpk9(0),wpk9(1),wpk9(2),wpk9(3),wpk9(4),wpk9(5),wpk9(6),wpk9(7),wpk9(8),wpk9(9),wpk9(10),wpk9(11),wpk9(12),wpk9(13),wpk9(14),wpk9(15),RC10,wpk10,RC11);block27: froundport map(aa1,aa2,aa3,aa4,aa5,aa6,aa7,aa8,aa9,aa10,aa11,aa12,aa13,aa14,aa15,aa16,wpk10(0),wpk10(1),wpk10(2),wpk10(3),wpk10(4),wpk10(5),wpk10(6),wpk10(7),wpk10(8),wpk10(9),wpk10(10),wpk10(11),wpk10(12),wpk10(13),wpk10(14),wpk10(15),ap1,ap2,ap3,ap4,ap5,ap6,ap7,ap8,ap9,ap10,ap11,ap12,ap13,ap14,ap15,ap16);block18: droundport map( ap1,ap2,ap3,ap4,ap5,ap6,ap7,ap8,ap9,ap10,ap11,ap12,ap13,ap14,ap15,ap16,wpk9(0),wpk9(1),wpk9(2),wpk9(3),wpk9(4),wpk9(5),wpk9(6),wpk9(7),wpk9(8),wpk9(9),wpk9(10),wpk9(11),wpk9(12),wpk9(13),wpk9(14),wpk9(15),pc1,pc2,pc3,pc4,tap1,tap2,tap3,tap4,tap5,tap6,tap7,tap8,tap9,tap10,tap11,tap12,tap13,tap14,tap15,tap16);block19: droundport map( tap1,tap2,tap3,tap4,tap5,tap6,tap7,tap8,tap9,tap10,tap11,tap12,tap13,tap14,tap15,tap16,wpk8(0),wpk8(1),wpk8(2),wpk8(3),wpk8(4),wpk8(5),wpk8(6),wpk8(7),wpk8(8),wpk8(9),wpk8(10),wpk8(11),wpk8(12),wpk8(13),wpk8(14),wpk8(15),pc1,pc2,pc3,pc4,sap1,sap2,sap3,sap4,sap5,sap6,sap7,sap8,sap9,sap10,sap11,sap12,sap13,sap14,sap15,sap16);block20: droundport map( sap1,sap2,sap3,sap4,sap5,sap6,sap7,sap8,sap9,sap10,sap11,sap12,sap13,sap14,sap15,sap16,wpk7(0),wpk7(1),wpk7(2),wpk7(3),wpk7(4),wpk7(5),wpk7(6),wpk7(7),wpk7(8),wpk7(9),wpk7(10),wpk7(11),wpk7(12),wpk7(13),wpk7(14),wpk7(15),pc1,pc2,pc3,pc4,aap1,aap2,aap3,aap4,aap5,aap6,aap7,aap8,aap9,aap10,aap11,aap12,aap13,aap14,aap15,aap16);block21: droundport map( aap1,aap2,aap3,aap4,aap5,aap6,aap7,aap8,aap9,aap10,aap11,aap12,aap13,aap14,aap15,aap16,wpk6(0),wpk6(1),wpk6(2),wpk6(3),wpk6(4),wpk6(5),wpk6(6),wpk6(7),wpk6(8),wpk6(9),wpk6(10),wpk6(11),wpk6(12),wpk6(13),wpk6(14),wpk6(15),pc1,pc2,pc3,pc4,bap1,bap2,bap3,bap4,bap5,bap6,bap7,bap8,bap9,bap10,bap11,bap12,bap13,bap14,bap15,bap16);block22: droundport map( bap1,bap2,bap3,bap4,bap5,bap6,bap7,bap8,bap9,bap10,bap11,bap12,bap13,bap14,
xcii
bap15,bap16,wpk5(0),wpk5(1),wpk5(2),wpk5(3),wpk5(4),wpk5(5),wpk5(6),wpk5(7),wpk5(8),wpk5(9),wpk5(10),wpk5(11),wpk5(12),wpk5(13),wpk5(14),wpk5(15),pc1,pc2,pc3,pc4,cap1,cap2,cap3,cap4,cap5,cap6,cap7,cap8,cap9,cap10,cap11,cap12,cap13,cap14,cap15,cap16);block23: droundport map( cap1,cap2,cap3,cap4,cap5,cap6,cap7,cap8,cap9,cap10,cap11,cap12,cap13,cap14,cap15,cap16,wpk4(0),wpk4(1),wpk4(2),wpk4(3),wpk4(4),wpk4(5),wpk4(6),wpk4(7),wpk4(8),wpk4(9),wpk4(10),wpk4(11),wpk4(12),wpk4(13),wpk4(14),wpk4(15),pc1,pc2,pc3,pc4,dap1,dap2,dap3,dap4,dap5,dap6,dap7,dap8,dap9,dap10,dap11,dap12,dap13,dap14,dap15,dap16);block24: droundport map( dap1,dap2,dap3,dap4,dap5,dap6,dap7,dap8,dap9,dap10,dap11,dap12,dap13,dap14,dap15,dap16,wpk3(0),wpk3(1),wpk3(2),wpk3(3),wpk3(4),wpk3(5),wpk3(6),wpk3(7),wpk3(8),wpk3(9),wpk3(10),wpk3(11),wpk3(12),wpk3(13),wpk3(14),wpk3(15),pc1,pc2,pc3,pc4,fap1,fap2,fap3,fap4,fap5,fap6,fap7,fap8,fap9,fap10,fap11,fap12,fap13,fap14,fap15,fap16);block25: droundport map( fap1,fap2,fap3,fap4,fap5,fap6,fap7,fap8,fap9,fap10,fap11,fap12,fap13,fap14,fap15,fap16,wpk2(0),wpk2(1),wpk2(2),wpk2(3),wpk2(4),wpk2(5),wpk2(6),wpk2(7),wpk2(8),wpk2(9),wpk2(10),wpk2(11),wpk2(12),wpk2(13),wpk2(14),wpk2(15),pc1,pc2,pc3,pc4,gap1,gap2,gap3,gap4,gap5,gap6,gap7,gap8,gap9,gap10,gap11,gap12,gap13,gap14,gap15,gap16);block26: droundport map( gap1,gap2,gap3,gap4,gap5,gap6,gap7,gap8,gap9,gap10,gap11,gap12,gap13,gap14,gap15,gap16,wpk1(0),wpk1(1),wpk1(2),wpk1(3),wpk1(4),wpk1(5),wpk1(6),wpk1(7),wpk1(8),wpk1(9),wpk1(10),wpk1(11),wpk1(12),wpk1(13),wpk1(14),wpk1(15),pc1,pc2,pc3,pc4,hap1,hap2,hap3,hap4,hap5,hap6,hap7,hap8,hap9,hap10,hap11,hap12,hap13,hap14,hap15,hap16);
block2: roundkeyport map(hap1,k1, b1);block3: roundkeyport map(hap2, k2,b2);block4: roundkeyport map(hap3, k3,b3);block5:roundkeyport map(hap4,k4,b4);block6: roundkeyport map(hap5,k5,b5);block7: roundkeyport map(hap6,k6,b6);block8: roundkeyport map(hap7, k7,b7);block9: roundkeyport map(hap8, k8,b8);block10: roundkeyport map(hap9, k9,b9);block11: roundkeyport map(hap10, k10,b10);block12: roundkeyport map(hap11,k11,b11);block13: roundkeyport map(hap12,k12,b12);block14: roundkeyport map(hap13, k13,b13);block15: roundkeyport map(hap14, k14,b14);block16: roundkeyport map(hap15, k15,b15);
xciii
block17: roundkeyport map(hap16, k16,b16);end architecture;
xciv