RTL Implementations and FPGA Benchmarking of Three...
-
Upload
truongnhan -
Category
Documents
-
view
215 -
download
0
Transcript of RTL Implementations and FPGA Benchmarking of Three...
![Page 1: RTL Implementations and FPGA Benchmarking of Three ...ece.gmu.edu/~kgaj/publications/conferences/GMU_DSD... · RTL Implementations and FPGA Benchmarking of Three Authenticated Ciphers](https://reader033.fdocuments.in/reader033/viewer/2022051509/5b0788367f8b9ac33f8e5f49/html5/thumbnails/1.jpg)
William Diehl and Kris Gaj
ECE Department, George Mason University, Fairfax, Virginia, USA
http://cryptography.gmu.edu
RTL Implementations and FPGA Benchmarking of Three Authenticated Ciphers Competing in
CAESAR Round Two
Based on work supported by the National Science Foundation under
Grant No. 1314540
![Page 2: RTL Implementations and FPGA Benchmarking of Three ...ece.gmu.edu/~kgaj/publications/conferences/GMU_DSD... · RTL Implementations and FPGA Benchmarking of Three Authenticated Ciphers](https://reader033.fdocuments.in/reader033/viewer/2022051509/5b0788367f8b9ac33f8e5f49/html5/thumbnails/2.jpg)
Outline
• CAESAR Competition and Authenticated Ciphers
• CAESAR Hardware API & Compliant Code Development
• Discussion of Designs
• Results
• Summary, Conclusions, and Lessons Learned
2
![Page 3: RTL Implementations and FPGA Benchmarking of Three ...ece.gmu.edu/~kgaj/publications/conferences/GMU_DSD... · RTL Implementations and FPGA Benchmarking of Three Authenticated Ciphers](https://reader033.fdocuments.in/reader033/viewer/2022051509/5b0788367f8b9ac33f8e5f49/html5/thumbnails/3.jpg)
Cryptographic Standard Contests
time
97 98 99 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17
AES
NESSIE
CRYPTREC
eSTREAM
SHA-3
34 stream 4 HW winnersciphers → + 4 SW winners
51 hash functions → 1 winner
15 block ciphers → 1 winner
IX.1997 X.2000
I.2000 XII.2002
IV.2008
X.2007 X.2012
XI.2004
CAESAR
I.2013
57 authenticated ciphers → multiple winners
XII.2017
3
![Page 4: RTL Implementations and FPGA Benchmarking of Three ...ece.gmu.edu/~kgaj/publications/conferences/GMU_DSD... · RTL Implementations and FPGA Benchmarking of Three Authenticated Ciphers](https://reader033.fdocuments.in/reader033/viewer/2022051509/5b0788367f8b9ac33f8e5f49/html5/thumbnails/4.jpg)
Authenticated Ciphers
Combine the functionality of confidentiality, integrity, and authenticity
Notation: Npub = Public Message Number; (Enc) Nsec = (Encrypted) Secret Message Number; AD = Associated Data
4
![Page 5: RTL Implementations and FPGA Benchmarking of Three ...ece.gmu.edu/~kgaj/publications/conferences/GMU_DSD... · RTL Implementations and FPGA Benchmarking of Three Authenticated Ciphers](https://reader033.fdocuments.in/reader033/viewer/2022051509/5b0788367f8b9ac33f8e5f49/html5/thumbnails/5.jpg)
Evaluation Criteria
Security
Software Efficiency Hardware Efficiency
Simplicity
FPGAs ASICs
Flexibility Licensing
µProcessors µControllers
5
![Page 6: RTL Implementations and FPGA Benchmarking of Three ...ece.gmu.edu/~kgaj/publications/conferences/GMU_DSD... · RTL Implementations and FPGA Benchmarking of Three Authenticated Ciphers](https://reader033.fdocuments.in/reader033/viewer/2022051509/5b0788367f8b9ac33f8e5f49/html5/thumbnails/6.jpg)
Motivation for Universal API
• Hardware API can have a high influence on Area and Throughput/Area ratio of all
implementations
• Hardware API typically much more difficult to modify than Software API
• Without a comprehensive hardware API, the comparison highly unreliable and
potentially unfair
• Designers can “play to strengths” and “hide weaknesses”
Conclusion: Impossible to perform fair evaluation of hardware
implementations without standardized interface and protocol
6
![Page 7: RTL Implementations and FPGA Benchmarking of Three ...ece.gmu.edu/~kgaj/publications/conferences/GMU_DSD... · RTL Implementations and FPGA Benchmarking of Three Authenticated Ciphers](https://reader033.fdocuments.in/reader033/viewer/2022051509/5b0788367f8b9ac33f8e5f49/html5/thumbnails/7.jpg)
Specifies:
• Minimum compliance criteria
• Interface
• Communication protocol
• Timing characteristics
Assures:
• Compatibility
• Fairness
Timeline:
• Based on the GMU Hardware API presented at CryptArchi 2015,
DIAC 2015, and ReConFig 2015
• Revised version posted on Feb. 15, 2016
• Officially approved by the CAESAR Committee on May 6, 2016
CAESAR Hardware API
7
![Page 8: RTL Implementations and FPGA Benchmarking of Three ...ece.gmu.edu/~kgaj/publications/conferences/GMU_DSD... · RTL Implementations and FPGA Benchmarking of Three Authenticated Ciphers](https://reader033.fdocuments.in/reader033/viewer/2022051509/5b0788367f8b9ac33f8e5f49/html5/thumbnails/8.jpg)
General Interface and Internal Architecture for High-Speed Implementations
Additional detail available at https://cryptography.gmu.edu/athena/8
![Page 9: RTL Implementations and FPGA Benchmarking of Three ...ece.gmu.edu/~kgaj/publications/conferences/GMU_DSD... · RTL Implementations and FPGA Benchmarking of Three Authenticated Ciphers](https://reader033.fdocuments.in/reader033/viewer/2022051509/5b0788367f8b9ac33f8e5f49/html5/thumbnails/9.jpg)
Implementer’s Guide• v1.0 - May 12, 2016
Development Packagea. VHDL code of generic pre-processing and post- processing units
for high-speed implementations (src_rtl)
b. Universal testbench (AEAD_TB)
c. Python app used to automatically generate test vectors
(aeadtvgen)
d. Six reference high-speed implementations of Dummy authenticated ciphers
GMU Support for Designers of VHDL/Verilog Code
https://cryptography.gmu.edu/athena/index.php?id=download
9
![Page 10: RTL Implementations and FPGA Benchmarking of Three ...ece.gmu.edu/~kgaj/publications/conferences/GMU_DSD... · RTL Implementations and FPGA Benchmarking of Three Authenticated Ciphers](https://reader033.fdocuments.in/reader033/viewer/2022051509/5b0788367f8b9ac33f8e5f49/html5/thumbnails/10.jpg)
ManualDesign
HDL Code
Automated Optimization
FPGA Tools
Preliminary Post
Place & Route
Results
(Resource Utilization,
Max. Clock Frequency)
Functional Verification
Specification
Test Vectors
The API Compliant Code Development
Reference
C Code
Development
Package
src_rtl
Development
Package
aeadtvgen
Development
Package
AEAD_TB
Pass/
Fail
Formulas
for the
Execution Time
& Throughput
10
![Page 11: RTL Implementations and FPGA Benchmarking of Three ...ece.gmu.edu/~kgaj/publications/conferences/GMU_DSD... · RTL Implementations and FPGA Benchmarking of Three Authenticated Ciphers](https://reader033.fdocuments.in/reader033/viewer/2022051509/5b0788367f8b9ac33f8e5f49/html5/thumbnails/11.jpg)
Ciphers• SCREAM, POET, Minalpher
• OMD (Not presented)
Compliance• Round Two published specification
• C Reference Code
• CAESAR HW API
Optimization Criteria1. Throughput-to-area (TP/A) ratio
2. Throughput (Maximize frequency; minimize cycles/block)
3. Area (Minimize LUTs)
My Tasks
11
![Page 12: RTL Implementations and FPGA Benchmarking of Three ...ece.gmu.edu/~kgaj/publications/conferences/GMU_DSD... · RTL Implementations and FPGA Benchmarking of Three Authenticated Ciphers](https://reader033.fdocuments.in/reader033/viewer/2022051509/5b0788367f8b9ac33f8e5f49/html5/thumbnails/12.jpg)
SCREAM
Side-Channel Resistant Authenticated Encryption with Masking
• Based on Liskov, Rivest, and Wagner’s “Tweakable Block Cipher”
• Unique tweak for every block
• 128-bit key, state variable, tag
• Padding for Associated Data
12
![Page 13: RTL Implementations and FPGA Benchmarking of Three ...ece.gmu.edu/~kgaj/publications/conferences/GMU_DSD... · RTL Implementations and FPGA Benchmarking of Three Authenticated Ciphers](https://reader033.fdocuments.in/reader033/viewer/2022051509/5b0788367f8b9ac33f8e5f49/html5/thumbnails/13.jpg)
SCREAM (cont’d)
Cryptographic primitive is Tweakable Block Cipher
(TBC) (EK)
10 steps (σ) per block, 2 rounds (ρ) per step• Tweak updated once per step to form Tweak key (TK)
Non-linear substitution layer composed of 8x8
“nearly involute” S-Boxes
16-bit Round Constant, RC(ρ,σ) applied each
round
Linear permutation layer L-Box
13Bus widths are 128 bits unless indicated
![Page 14: RTL Implementations and FPGA Benchmarking of Three ...ece.gmu.edu/~kgaj/publications/conferences/GMU_DSD... · RTL Implementations and FPGA Benchmarking of Three Authenticated Ciphers](https://reader033.fdocuments.in/reader033/viewer/2022051509/5b0788367f8b9ac33f8e5f49/html5/thumbnails/14.jpg)
Basic Iterative versus Unrolled Architectures
14
Basic Iterative Unrolled x2
Typically TP/A ratio decreases
Source: “Throughput vs. Area trade-offs in High-speed Architectures of Five Round 3 SHA-3
Candidates Implemented Using Xilinx and Altera FPGAs,” Homsirikamol, Rogawski, Gaj, 2011
![Page 15: RTL Implementations and FPGA Benchmarking of Three ...ece.gmu.edu/~kgaj/publications/conferences/GMU_DSD... · RTL Implementations and FPGA Benchmarking of Three Authenticated Ciphers](https://reader033.fdocuments.in/reader033/viewer/2022051509/5b0788367f8b9ac33f8e5f49/html5/thumbnails/15.jpg)
SCREAM – CipherCore Datapath
Notation:
Auth = Authenticator
Sum = Checksum
Trunc = Truncation
T0 = Initial Tweak
EK = Tweakable Block Cipher
npub = Public Message Number
exp_tag = expected tag
Interesting Features:
• Initial Tweak = f(npub, counter, type &
length of block)
• Tag = f(Auth, EK[Sum])
• Truncation of output of partial final
plaintext and ciphertext blocks
15Bus width of thick wires is 128 bits unless indicated. Bus width of thin wires is 1 bit.
![Page 16: RTL Implementations and FPGA Benchmarking of Three ...ece.gmu.edu/~kgaj/publications/conferences/GMU_DSD... · RTL Implementations and FPGA Benchmarking of Three Authenticated Ciphers](https://reader033.fdocuments.in/reader033/viewer/2022051509/5b0788367f8b9ac33f8e5f49/html5/thumbnails/16.jpg)
POET
• Pipelineable On-line Encryption with Authentication Tag
• “Cipher Agnostic” – uses any 128-bit block cipher and ε-AXU keyed hash function
• AES-128 used for block cipher, and AES-4 used for keyed hash
• 128-bit key, state variable, tag
• Padding for Associated Data
16
![Page 17: RTL Implementations and FPGA Benchmarking of Three ...ece.gmu.edu/~kgaj/publications/conferences/GMU_DSD... · RTL Implementations and FPGA Benchmarking of Three Authenticated Ciphers](https://reader033.fdocuments.in/reader033/viewer/2022051509/5b0788367f8b9ac33f8e5f49/html5/thumbnails/17.jpg)
POET – CipherCore Datapath
Interesting Features:
• Requires three sub keys (L, K, Kf) and
round key generation in AES
• L sub key multiplied by 2 in GF(2128)
during header processing
• Variable shifts for tag generation and
verificationAESAES4
con
st
din
L 2i
key
key
bdo
tag
key
decrypt
dout dout
din
dout
L
K
S
Σ τ
Kf
X
τ
S
Y
τ
|M|
x2
bdi
Ft
>>distance
<<
= Trunc
Τα
last
4
34
Trunc
S
Τ tag_valid
=Τ
βΤ’ τ
ατ'
τ
17
Notation:
x2 = GF(2128) x2 field multiplier
τ = Authenticator
Σ = Associated Data cumulative sum
>> (<<) = Variable right (left) shift
|M| = length of message
S = EK(|M|) Bus width of thick wires is 128 bits unless indicated. Bus width of thin wires is 1 bit.
![Page 18: RTL Implementations and FPGA Benchmarking of Three ...ece.gmu.edu/~kgaj/publications/conferences/GMU_DSD... · RTL Implementations and FPGA Benchmarking of Three Authenticated Ciphers](https://reader033.fdocuments.in/reader033/viewer/2022051509/5b0788367f8b9ac33f8e5f49/html5/thumbnails/18.jpg)
Minalpher
• Uses Tweakable Even-Mansour (TEM) with Minalpher-P primitive
• 2 TEM cores for encrypt/decrypt and tag generation in parallel
• 128-bit key, 256-bit state, 128-bit tag
• Each final plaintext block must have *10 padding even if full
18
![Page 19: RTL Implementations and FPGA Benchmarking of Three ...ece.gmu.edu/~kgaj/publications/conferences/GMU_DSD... · RTL Implementations and FPGA Benchmarking of Three Authenticated Ciphers](https://reader033.fdocuments.in/reader033/viewer/2022051509/5b0788367f8b9ac33f8e5f49/html5/thumbnails/19.jpg)
Minalpher (cont’d)
Each Minalpher-P consists of
• S (SubNibbles) 4x4 S-Boxes,
• T (ShuffleRows),
• E (Round Constant),
• M (MixColumns)
• Decryption has reversed ‘E’ and ‘M’
17.5 rounds of Minalpher-P in one TEM
• Final “half round” is an extra S and T
function
19
Bus widths of paths A and B are 128 bits.
![Page 20: RTL Implementations and FPGA Benchmarking of Three ...ece.gmu.edu/~kgaj/publications/conferences/GMU_DSD... · RTL Implementations and FPGA Benchmarking of Three Authenticated Ciphers](https://reader033.fdocuments.in/reader033/viewer/2022051509/5b0788367f8b9ac33f8e5f49/html5/thumbnails/20.jpg)
Minalpher – CipherCore Datapath
Interesting Features:
• Parallel encrypt/decrypt and tag
generation using 2 TEM cores requires
19 cycles/block
• “Serial truncator” to remove final *10
during decryption
20
Notation:
TEM = Tweakable Even Mansour
A = Associated Data register
M = Plaintext/Ciphertext Register (TEM)
C = Plaintext/Ciphertext Register (TEM aux)
L = Block specific mask
T = Cumulative Tag generation
npub = Public Message Number
exp_tag = expected tag Bus width of thick wires is 128 bits unless indicated. Bus width of thin wires is 1 bit.
![Page 21: RTL Implementations and FPGA Benchmarking of Three ...ece.gmu.edu/~kgaj/publications/conferences/GMU_DSD... · RTL Implementations and FPGA Benchmarking of Three Authenticated Ciphers](https://reader033.fdocuments.in/reader033/viewer/2022051509/5b0788367f8b9ac33f8e5f49/html5/thumbnails/21.jpg)
FPGA Devices
• Xilinx Virtex-6: xc6vlx240tff1156-3
FPGA Devices & Tools
FPGA Tools:
Synthesis Tool: Xilinx XST 14.7
Implementation Tool: Xilinx ISE 14.7
Automated Optimization: ATHENa
Options of tools:
No embedded memories and no embedded DSP units allowed inside of AEAD
21
![Page 22: RTL Implementations and FPGA Benchmarking of Three ...ece.gmu.edu/~kgaj/publications/conferences/GMU_DSD... · RTL Implementations and FPGA Benchmarking of Three Authenticated Ciphers](https://reader033.fdocuments.in/reader033/viewer/2022051509/5b0788367f8b9ac33f8e5f49/html5/thumbnails/22.jpg)
0.506
0.497
0.478
0.385
SCREAM has highest
Throughput-to-Area (TP/A) Ratio• Basic iterative (=1 round/clock cycle) higher
TP/A than Unrolled x2
POET has high TP but large area• Several AES cores required
Minalpher close to SCREAM in TP/A
TP
(M
bp
s)
Area (LUTs)
SCREAM (Basic Iterative)
SCREAM (Unrolled x2)
MinalpherPOET
Results of GMU Implementations – Virtex 6
22
![Page 23: RTL Implementations and FPGA Benchmarking of Three ...ece.gmu.edu/~kgaj/publications/conferences/GMU_DSD... · RTL Implementations and FPGA Benchmarking of Three Authenticated Ciphers](https://reader033.fdocuments.in/reader033/viewer/2022051509/5b0788367f8b9ac33f8e5f49/html5/thumbnails/23.jpg)
Implementations by Other Groups
SCREAM (by Lubos Gaspar & Stephanie Kerckhof, CG UCL, INRIA)
• Full-block width custom interface
• No support for the CAESAR API Protocol
POET (by Amir Moradi, EmSec Rühr-Universität Bochum)
• Full-block width custom interface
• No support for the CAESAR API Protocol
Minalpher (by Takeshi Sugawara, Minalpher Team/Mitsubishi Electric)
• Fully compliant with the CAESAR API
23
![Page 24: RTL Implementations and FPGA Benchmarking of Three ...ece.gmu.edu/~kgaj/publications/conferences/GMU_DSD... · RTL Implementations and FPGA Benchmarking of Three Authenticated Ciphers](https://reader033.fdocuments.in/reader033/viewer/2022051509/5b0788367f8b9ac33f8e5f49/html5/thumbnails/24.jpg)
Differences in API Specifications
SCREAM• Contains 5-bit “length” input and output fields
(Allows for partial blocks)
POET• No input for length of final block (key parameter in
tag generation and verification)
• Only 1 output port (Details of tag generation and
verification left to higher protocol)
Derived from:: A. Moradi (2016) and S. Kerckhof, L. Gaspar (2015)
24
![Page 25: RTL Implementations and FPGA Benchmarking of Three ...ece.gmu.edu/~kgaj/publications/conferences/GMU_DSD... · RTL Implementations and FPGA Benchmarking of Three Authenticated Ciphers](https://reader033.fdocuments.in/reader033/viewer/2022051509/5b0788367f8b9ac33f8e5f49/html5/thumbnails/25.jpg)
Comparison with non-GMU Implementations – Virtex 6
0.506
0.497
0.478
0.385
0.679
0.575
0.272
0.636
Area (LUTs)
TP
(M
bp
s)
SCREAM (Basic Iterative)
SCREAM (Unrolled x2)
Minalpher POET
Minalpher has highest (TP/A) Ratio• 1 TEM core versus 2 TEM cores in GMU
design (39 cycles/block vs. 19 cycles/block)
SCREAM TP/As close to GMU• Not compliant with CAESAR HW API
Divergent TP/A for POET• Not compliant with CAESAR HW API
• Different choice of architecture
25
![Page 26: RTL Implementations and FPGA Benchmarking of Three ...ece.gmu.edu/~kgaj/publications/conferences/GMU_DSD... · RTL Implementations and FPGA Benchmarking of Three Authenticated Ciphers](https://reader033.fdocuments.in/reader033/viewer/2022051509/5b0788367f8b9ac33f8e5f49/html5/thumbnails/26.jpg)
Throughput/Area of AES-GCM = 1.020 (Mbit/s)/LUTs
Comparison with all Round Two Candidates
E – Throughput/Area for Encryption
D – Throughput/Area for Decryption
A – Throughput/Area for Authentication Only
Default: Throughput/Area the same for all 3 operations
Relative Throughput/Area in Virtex 6 vs. AES-GCM
26
17 21 25
(Mitsubishi) (GMU CERG) (GMU CERG)
![Page 27: RTL Implementations and FPGA Benchmarking of Three ...ece.gmu.edu/~kgaj/publications/conferences/GMU_DSD... · RTL Implementations and FPGA Benchmarking of Three Authenticated Ciphers](https://reader033.fdocuments.in/reader033/viewer/2022051509/5b0788367f8b9ac33f8e5f49/html5/thumbnails/27.jpg)
Summary & Conclusions
Three functionally correct high-speed implementations of CAESAR Round Two Candidates using RTL design in CAESAR HW API
• Substantial part of GMU CERG benchmarking effort in support of CAESAR Round Two evaluations
SCREAM (w/basic iterative architecture) had highest TP/A in GMU CERG implementations
• Minalpher (Mitsubishi version) highest TP/A overall
27
![Page 28: RTL Implementations and FPGA Benchmarking of Three ...ece.gmu.edu/~kgaj/publications/conferences/GMU_DSD... · RTL Implementations and FPGA Benchmarking of Three Authenticated Ciphers](https://reader033.fdocuments.in/reader033/viewer/2022051509/5b0788367f8b9ac33f8e5f49/html5/thumbnails/28.jpg)
Lessons Learned
Features of implemented ciphers negatively affecting their performance
• Variable Shifts and Truncations
• #Ciphertext blocks ≠ #Plaintext blocks
28
![Page 29: RTL Implementations and FPGA Benchmarking of Three ...ece.gmu.edu/~kgaj/publications/conferences/GMU_DSD... · RTL Implementations and FPGA Benchmarking of Three Authenticated Ciphers](https://reader033.fdocuments.in/reader033/viewer/2022051509/5b0788367f8b9ac33f8e5f49/html5/thumbnails/29.jpg)
One Stop Website
https://cryptography.gmu.edu/athena/index.php?id=download
OR
https://cryptography.gmu.edu/athena
and click on CAESAR
• VHDL/Verilog Code of CAESAR Candidates: Summary I
• VHDL/Verilog Code of CAESAR Candidates: Summary II
• ATHENa Database of Results: Rankings View
• ATHENa Database of Results: Table View
• Benchmarking of Round 2 CAESAR Candidates in Hardware:
Methodology, Designs & Results
• GMU Implementations of Authenticated Ciphers and Their Building
Blocks
• CAESAR Hardware API v1.029
![Page 30: RTL Implementations and FPGA Benchmarking of Three ...ece.gmu.edu/~kgaj/publications/conferences/GMU_DSD... · RTL Implementations and FPGA Benchmarking of Three Authenticated Ciphers](https://reader033.fdocuments.in/reader033/viewer/2022051509/5b0788367f8b9ac33f8e5f49/html5/thumbnails/30.jpg)
Questions?