Comparison of Hardware Performance of Selected Phase II eSTREAM
Transcript of Comparison of Hardware Performance of Selected Phase II eSTREAM
Comparison of Hardware Performanceof Selected Phase II
eSTREAM Candidates
Kris Gaj
Gabriel Southern
Ramakrishna Bachimanchi
&
Fall 2006 GMU ECE 545: Introduction to VHDL class
George Mason University
Goal
Comparison of Profile II (hardware) Phase 2 Focus candidates:
• Grain• Mickey-128• Phelix• Trivium
Two additional reference points:
• A5/1 (old & insecure GSM standard)• AES (compact architecture)
Two hardware technologies:
• Xilinx Spartan 3 FPGAs• TSMC 90 nm standard-cell library ASICs
Genesis & approach
• Part of GMU Fall 2006 graduate courseECE 545 Introduction to VHDL
• Individual 6-week project
• 4 students working independently on each eSTREAM cipher
• best code for each algorithm selected at the endof the semester
• selected designs verified and revised in order to assure• correct functionality• standard interface & control• uniform design & coding style
eSTREAMcipher
clk
reset
enc_dec
data_in
data_in_ready
data_in_write
d
data_out
writefull
d
Fixed interface
key_IV
key_IV_ready
key_IV_write
k
Two independent parameters
d – number of bits processed per clock cycle (radix)
k – number of bits of key/IV loaded per clock cycle
d encryption/decryptionthroughput
area
# of pins
+
ksetup time(key & IV loading+ initialization) area
# of pins
All results generated with k = d
Methodology
Specification
Execution Unit Control Unit
Block
diagram
Algorithmic
State Machine
VHDL code VHDL code
Methodology & tools
Xilinx ISE v. 8.1iImplementation(mapping, placing
& routing)
SynopsysDesign Analyzer
X-2005.9
SynplicitySynplify Pro
v. 8.5
Logic synthesis
Aldec Active HDLModelSim Xilinx Edition
VHDL simulation& debugging
ASICFPGATechnology
All results afterplacing & routing
All results afterlogic synthesis
No physicalimplementation
Assumptions
• Only encryption/decryption, no MAC
• Maximum allowed key and IV sizes
• Key and IV need to be reloaded each time eitherof them changes
• No precomputations of internal state outsideof the circuit.
642264A5/1
2888080Trivium
288128256Phelix
320128128Mickey-128
1606480Grain
Internal state sizeIV sizeKey sizeCipher22
Three categories of stream ciphersrepresented among those implemented
Based onLFSRs / NFSRswith serial inputs
Based onLFSRs / NFSRswith parallel inputs
NFSR LFSR NFSR LFSR
Based on basic iterativearchitecture and componentoperations of block ciphersand hash functions
Phelix, AES in OFB or CTR mode
Grain, Trivium, A5/1 Mickey-128
Optimizations for the first group of ciphersGrain
d=1 d=2
Si+80=si+62 + si+52 + si+38 + si+23 + si+13 + si
Si+81=si+63 + si+53 + si+39 + si+24 + si+14 + si+1
si si+79
si+80 si+80
si+81
sisi+1 si+79
HB
HB
HB
HB
Key mixing Encryption
HB HB
Key mixing Encryption
HB
HB
Key mixing Encryption
HB
Key mixing Encryption
2 x number of clock cycles
Block function,Sharing
Block function,No sharing
Half-Blockfunction,Sharing
Half-Block function,No sharing
Optimizations for the third group of ciphersPhelix
Ease of design as perceivedby students
based on the specification of each cipher
Average score( 5 – very easy,1 – very difficult)
Number of studentswho selected the
cipher as their first choice
Trivium
Mickey-128
Grain
Phelix
3.36
3.32
3.00
2.00
5
3
4
0
0
2000
4000
6000
8000
10000
12000
0 200 400 600 800 1000 1200 1400 Area[CLB slices]
Throughput[Mbit/s]
Phelix
Trivium
T64
T32
T16
Grain
Mickey-128
G16
G1
Best
Worst
Throughput vs. areaFPGA: Xilinx Spartan 3 family
AES
0
200
400
600
800
1000
1200
1200 1250 1300 1350 1400 Area[CLB slices]
Throughput[Mbit/s]
Block function,No sharing
Block function,Sharing
Half-Block function,Sharing
Half-Block function,No sharing
Throughput vs. area: PhelixFPGA: Xilinx Spartan 3 family
0
2000
4000
6000
8000
10000
12000
0 50 100 150 200 250 300 350 400Area
[CLB slices]
Throughput[Mbit/s]
G1 G2 G4
G8
G16
M
T64
T32
T16
T8
T1-4
A – A5/1G – GrainM – Mickey-128T – Trivium
Legend:
Throughput vs. area: Grain, Mickey-128, Trivium, A5/1FPGA: Xilinx Spartan 3 family
0
500
1000
1500
2000
2500
3000
0 50 100 150 200 250 300 350Area
[CLB slices]
Throughput[Mbit/s]
G16
M
T16
G8
T8
T4
T2
T1
G4
G2G1A1
A3 A4
A – A5/1G – GrainM – Mickey-128T – Trivium
Legend:
Throughput vs. area: Throughput up to 3 Gbit/sFPGA: Xilinx Spartan 3 family
0
100
200
300
400
500
600
700
800
0 200 400 600 800 1000 1200Area
[CLB slices]
Throughput[Mbit/s]
Mickey-128
Trivium-1
Grain-1A5/1-1
Phelix (Half-block, No sharing)
Optimizations for minimum areaFPGA: Xilinx Spartan 3 family
0
200
400
600
800
1000
1200
Area[CLB slices]
Mickey-128(d=1)
Trivium(d=1)
Grain(d=1)
A5/1(d=1)
Phelix(d=32, HB-NS)
57122
188230
1216
x 0.47x 1.54
x 1.00
x 1.89
x 9.97
Optimizations for minimum areaFPGA: Xilinx Spartan 3 family
x cipher area/Grain area
0
2000
4000
6000
8000
10000
12000
0 200 400 600 800 1000 1200Area
[CLB slices]
Throughput[Mbit/s]
Mickey-128
Trivium-64
Grain-16
A5/1-1
Phelix (Full block, Sharing)
Optimizations for maximum throughput to area ratioFPGA: Xilinx Spartan 3 family
0.00
5.00
10.00
15.00
20.00
25.00
30.00
Mickey-128(d=1)
Trivium(d=64)
Grain(d=16)
A5/1(d=1)
Phelix(d=32, FB-SH)
Throughput/Area[Mbit/sCLB slices]
31.34
6.97
3.050.96 0.83
x 4.5
x 10.3x 32.5 x 37.7
x 1.0
x Trivium ratio / cipher ratio
Optimizations for maximum throughput to area ratioFPGA: Xilinx Spartan 3 family
0
1000
2000
3000
4000
5000
6000
Mickey-128Trivium Grain A5/1 Phelix
k= 1 2 4 8 16 32 64 1 2 4 8 16 1 3 4 32
Setup Time = Key & IV Loading + Initialization TimeFPGA: Xilinx Spartan 3 family
Setup time[ns]
0
200
400
600
800
1000
1200
2000
4000
6000
8000
10000
12000
Clock cycles Nanoseconds
Mickey-128(k=1)
Trivium(k=1)
Grain(k=1)
A5/1(k=1)
Phelix(k=32)
0
Setup Time = Key & IV Loading + Initialization TimeFPGA: Xilinx Spartan 3 family
0
50
100
150
200
250
300
350
400
Mickey-128(k=1)
Trivium(k=64)
Grain(k=16)
A5/1(k=1)
Phelix(k=32)
0
500
1000
1500
2000
2500
3000
3500
4000
Setup Time = Key & IV Loading + Initialization TimeFPGA: Xilinx Spartan 3 family
Clock cycles Nanoseconds
Results forASICs: TSMC 90 nm library
Relative results comparable to results for FPGAs
Absolute speed increase by a factor from 3 to 10before ASIC layout synthesis
Conclusions• Very large differences among candidate ciphers
(much larger than for five final candidates in the AES contest)
Possible reasons:• variety of ciphers based on different design principles• different internal state, key, and IV sizes• early stage of the contest
Trivium and Grain outperform other eSTREAM ciphersin terms of
• flexibility• minimum area• maximum throughput to area ratio.
Once again ciphers based on LFSR and NFSRs show theirsuperiority in hardware implementations
Security analysis should focus first on the most efficient ciphers