Lecture 10 – Back End tools The complete mountainiverbauw/Courses/... · Page 1 KUL - COSIC HJ 94...
Transcript of Lecture 10 – Back End tools The complete mountainiverbauw/Courses/... · Page 1 KUL - COSIC HJ 94...
Page 1www.esat.kuleuven.be/cosic
KUL - COSIC HJ 94 – Lecture 10 Spring 2006
Lecture 10 – Back End toolsThe complete mountain
Ingrid [email protected]
K.U.Leuven, COSIC
KUL - COSIC HJ 94 – Lecture 10 Spring 2006
Overview• Lecture 1: what is a system-on-chip• Lecture 2: terminology for the different steps• Lecture 3: models of computation• Lecture 4: two MOC’s: SDFG & control flow• Lecture 5: control flow & FIR example• Lecture 6: fixed point refinement• Lecture 7: architecture exploration• Lecture 8: DSP Processors• Lecture 9: Domain specific processors – Viterbi example• Lecture 10 – today : Back-end, from VHDL to tape-out, applied to
crypto mountain
Page 2www.esat.kuleuven.be/cosic
KUL - COSIC HJ 94 – Lecture 10 Spring 2006
Outline & Goal• From VHDL to tape-out, applied to crypto mountain
• Reference: Chapter 8, "Implementation strategies for digital ICs" from the book "Digital Integrated Circuits, A design perspective" 2nd edition, by J. Rabaey, A. Chandrakasan, B. Nikolic, Prentice Hall 2003.
• P. Schaumont, and I. Verbauwhede, "Domain specific codesign for embedded security," IEEE Computer, vol. 36, no. 4, pp. 68-74, April 2003.
KUL - COSIC HJ 94 – Lecture 10 Spring 2006
Skiing down a mountain
Specification
ASIC SpecialPurpose
Retargetablecoprocessor
DSPprocessor
DSP-RISC RISC
Algorithm Transformations
Memory Transformations and Optimizations
Floating-point to Fixed-point
SPW, Matlab, C++
pipelining, unrolling
loop merging, compaction
40 bit accumulator
Page 3www.esat.kuleuven.be/cosic
KUL - COSIC HJ 94 – Lecture 10 Spring 2006
Wat?
• “Back-end”: VHDL, Verilog, synthesis, FPGA
ASIC ASIPSpecialpurpose
Retargetablecoprocessor
DSPDSPExtensionsTo RISC
RISC, VLIW
Hardware Software
System-on-a-chip, system in package
•C-compilation•Assembly optimization
•Verilog-VHDL•Synopsys synthesis•Cadence place&route•FPGA download
KUL - COSIC HJ 94 – Lecture 10 Spring 2006
Implementation Choices
Custom
Standard CellsCompiled Cells Macro Cells
Cell-based
Pre-diffused(Gate Arrays)
Pre-wired(FPGA's)
Array-based
Semicustom
Digital Circuit Implementation Approaches
Page 4www.esat.kuleuven.be/cosic
KUL - COSIC HJ 94 – Lecture 10 Spring 2006
The Custom Approach
Intel 4004
Courtesy Intel
KUL - COSIC HJ 94 – Lecture 10 Spring 2006
Transition to Automation and Regular Structures
Intel 4004 (‘71)Intel 4004 (‘71)Intel 8080Intel 8080 Intel 8085Intel 8085
Intel 8286Intel 8286 Intel 8486Intel 8486Courtesy Intel
Page 5www.esat.kuleuven.be/cosic
KUL - COSIC HJ 94 – Lecture 10 Spring 2006
Semicustom Design Flow
HDLHDL
Logic SynthesisLogic Synthesis
FloorplanningFloorplanning
PlacementPlacement
RoutingRouting
Tape-out
Circuit ExtractionCircuit Extraction
Pre-Layout Simulation
Pre-Layout Simulation
Post-Layout Simulation
Post-Layout Simulation
StructuralStructural
PhysicalPhysical
BehavioralBehavioralDesign CaptureD
esig
n Ite
ratio
nD
esig
n Ite
ratio
n
Timing closure!
KUL - COSIC HJ 94 – Lecture 10 Spring 2006
Cell-based Design (or standard cells)
Routing channel requirements arereduced by presenceof more interconnectlayers
Functionalmodule(RAM,multiplier, …)
Routingchannel
Logic cellFeedthrough cell
Row
s of
cel
ls
Page 6www.esat.kuleuven.be/cosic
KUL - COSIC HJ 94 – Lecture 10 Spring 2006
Standard Cell — Example
[Brodersen92]
KUL - COSIC HJ 94 – Lecture 10 Spring 2006
Standard Cell – The New Generation
Cell-structurehidden underinterconnect layers
Page 7www.esat.kuleuven.be/cosic
KUL - COSIC HJ 94 – Lecture 10 Spring 2006
Standard Cell - Example
3-input NAND cell(from ST Microelectronics):C = Load capacitanceT = input rise/fall time
KUL - COSIC HJ 94 – Lecture 10 Spring 2006
The security mountain
Page 8www.esat.kuleuven.be/cosic
KUL - COSIC HJ 94 – Lecture 10 Spring 2006
Security is as strong as the weakest link!
Cipher Design,Biometrics
DQ
Vcc
CPUCrypto
MEM
JCAJava
JVM
CLK
Identification
ConfidentialityIntegrity
SIM
DQ
Vcc
CPUMEM
JCAJava
KVM
CLK
Protocol: WLAN protocols, e-ID
Algorithm: Public Key, Secret Key, Hash,Biometrics
Architecture: Co-design, HW/SW, SOCSW (e.g. point counting algorithms)
Circuit: Circuit techniques to combat sidechannel analysis attacks
Micro-Architecture: co-processor design
Identification
ConfidentialityIntegrity
IdentificationIntegrity
SIMSIMSIM
• Skiing down the security mountain
KUL - COSIC HJ 94 – Lecture 10 Spring 2006
Why a hard engineering problem?
• More difficult to guarantee that something will not happen (attacks) than that something will happen.
• Engineers are trained to make something happen.
Page 9www.esat.kuleuven.be/cosic
KUL - COSIC HJ 94 – Lecture 10 Spring 2006
Example Spec: 802.11 WLAN - CCM mode• Encryption & MIC creationMIC = Message Integrity Check
Clear text frameFC Dur A1 A2 A3 A4SC QC PC Data MIC
AES_E(K) AES_E(K) AES_E(K)AES_E(K)AES_E(K)
CBC-MAC
AES_E(K)AES_E(K)
FC Dur A1 A2 A3 A4SC QC PC Data MIC
Pl(2)Pl(1)
Counter preload
Transmittedencrypted frame
IV
AES_E(K)
FCS
0 padded0 padded
Flag Nonce Dlen
Flag Nonce Cnt
Hlen
AES_E(K)
Pl(C)
AES_E(K)
Pl(0)
KUL - COSIC HJ 94 – Lecture 10 Spring 2006
Security algorithms
• Plaintext (data) is converted in ciphertext (indecipherable)– transformation is controlled by a key– transformation is invertible if key is known
• Two main categories– symmetric key: Ek and Dk are easy to derive from each other
usually Ek = Dk– public key: given Ek, it is infeasible to calculate Dk
Encryption Decryption
Plaintext Ciphertext Plaintext
Ek Dk
Page 10www.esat.kuleuven.be/cosic
KUL - COSIC HJ 94 – Lecture 10 Spring 2006
Symmetric key algorithm
• Encrypt and decrypt keys are same– parties share a secret (Key distribution problem!)
• security is derived from shared secret– algorithms designed for efficiency (hardware and/or software)
• two basic types– block ciphers and stream ciphers
EK(M) = C DK(C) = M
Message M Ciphertext C Plaintext M
Key K K
KUL - COSIC HJ 94 – Lecture 10 Spring 2006
Block cipher• Definition: “A block cipher breaks up the plaintext into strings (called
blocks) of a fixed length t over an alphabet A and encrypts one block at a time.”
• In practice:– repeated iterations,called rounds– each round has “subkey”derived by key schedule round1
round2
roundr
ninput block
output block
subkey1key schedule
key schedule
key schedule
subkey2
subkeyr
n
keyk
Page 11www.esat.kuleuven.be/cosic
KUL - COSIC HJ 94 – Lecture 10 Spring 2006
Stream cipher
• Key stream is XOR-ed with the data stream• stream ciphers process one symbol (= one bit) at a time• have some form of memory (state)
• But block ciphers can be turned into stream ciphers and vice versa . . .
state
f
g
+data stream
key stream
key
state
f
gkey
+key stream
cipher stream data stream
KUL - COSIC HJ 94 – Lecture 10 Spring 2006
AES (FIPS 197, 11/2001)
• Block cipher with 128 bit blocks
KeyAdd
ByteSub
ShiftRow
MixColumn
KeyAdd
Pi
ByteSub
ShiftRow
KeyAdd
Ki
K1
KNr
B0 B1 B2 B3 B4 B5 . . . B0 B4 B8 B12
B1 B5 B9 B13
B2 B6 B10 B14
B3 B7 B11 B15
Input byte stream AES state array
• Variable key size
128/192/256 bitsKey size dictates number of rounds Nr
Key size Nr Nk
AES 128 10 4AES 192 12 6AES 256 14 8
Page 12www.esat.kuleuven.be/cosic
KUL - COSIC HJ 94 – Lecture 10 Spring 2006
AES: Byte substitution
• Byte substitution: each byte individual• 16 identical Sboxes• 32 for Rijndael
a2 a6 a10 a14
a0 a4 a8 a12
a1 a5 a9 a13
a3 a7 a11 a15
b0 b4 b8 b12
b1 b5 b9 b13
b2 b6 b10 b14
b3 b7 b11 b15
GF(28)-1
Permute
ai bi
KUL - COSIC HJ 94 – Lecture 10 Spring 2006
AES: Shiftrow
• Shiftrow: circularly rotate each row of state array• Easy wiring
a2 a6 a10 a14
a0 a4 a8 a12
a1 a5 a9 a13
a3 a7 a11 a15
a10 a14 a2 a6
a0 a4 a8 a12
a5 a9 a13 a1
a15 a3 a7 a11
Shiftrow
Page 13www.esat.kuleuven.be/cosic
KUL - COSIC HJ 94 – Lecture 10 Spring 2006
AES: mix column• matrix multiplication of state array columns
– multiply with constant entries
a2 a6 a10 a14
a0 a4 a8 a12
a1 a5 a9 a13
a3 a7 a11 a15
b0 b4 b8 b12
b1 b5 b9 b13
b2 b6 b10 b14
b3 b7 b11 b15
bi
bi+1
bi+2
bi+3
ai
ai+1
ai+2
ai+3
2 3 1 1
1 2 3 1
1 1 2 3
3 1 1 2
=
a6 a5 a4 a3 a2 a1 a0 00 0 0 a7 a7 0 a7 a7
b7 b6 b5 b4 b3 b2 b1 b0
2 x
+
a7 a6 a5 a4 a3 a2 a1 a0 a6 a5 a4 a3 a2 a1 a0 00 0 0 a7 a7 0 a7 a7
b7 b6 b5 b4 b3 b2 b1 b0
+
3 x
KUL - COSIC HJ 94 – Lecture 10 Spring 2006
Mix column - encryption
GF(B x 2)
GF(B x 3)
+
G(x)00011011
<< 1carry
01
GF(B x 1)
+
GF(B)
8
abcd
02 01 01 0303 02 01 0101 03 02 0101 01 02 03
Mix Column Operation isGF(28) Linear Transform
+
<< 1 0
+++
Page 14www.esat.kuleuven.be/cosic
KUL - COSIC HJ 94 – Lecture 10 Spring 2006
Sbox optimization
• GF(28)-1 requires large Look up tables• Map to isomorphic fields, GF((24)2) or GF(((22)2)2) and invert there
• smaller but slower!
()2 ()-1p0
A
A-1
al
ah
GF(24)
GF(28)
GF(28)
KUL - COSIC HJ 94 – Lecture 10 Spring 2006
Sbox experiment• 0.18 µm CMOS, Synopsys experiment• size of 1 Sbox, push for area or for speed
0
200
400
600
8001000
1200
1400
1600
1800
2000
0 2 4 6 8
Latency (ns)
Are
a (g
ates
)
Direct Implementation Wolkerstorfer
GF(28)
GF((24)2)
Page 15www.esat.kuleuven.be/cosic
KUL - COSIC HJ 94 – Lecture 10 Spring 2006
Example: Memory-Mapped AES
aes_core
din dout ins
83232
memory-mapped interface
0x8000 0x8004address 0x8008
SW { volatile int *din = 0x8000;*din = 0xaa55aa55;
}
HW
aes_top
ld
reset
key
text_intext_out
done
128
128128
KUL - COSIC HJ 94 – Lecture 10 Spring 2006
648 Mbits/secAES EncryptionAsmPentium III [2]
41.4 W 0.015 (1/2200)
AES Encryption in Java [4]Emb. Sparc
450 bits/sec 120 mW 0.0000037 (1/9 600 000)
AES Encryption in CEmb. Sparc [3]
133 Kbits/sec 0.0011 (1/33000)
56 mW
Power
1.32 Gbit/secAES Encryption FPGA [1]
35.7 (1/1)2 Gbits/secAES Encryption on 0u18 CMOS
Figure of Merit(Gb/s/W)
ThroughputAES128bit key128bit data
490 mW
[1] Amphion CS5230 on Virtex2 + Xilinx Virtex2 Power Estimator[2] Helger Lipmaa PIII assembly handcoded + Intel Pentium III (1.13 GHz) Datasheet[3] gcc, 1 mW/MHz @ 120 Mhz Sparc – assumes 0.25 u CMOS[4] Java on KVM (Sun J2ME, non-JIT) on 1 mW/MHz @ 120 MHz Sparc – assumes 0.25 u CMOS
2.7 (1/13)
120 mW
Programmability ~ (Energy-efficiency)-1
Page 16www.esat.kuleuven.be/cosic
KUL - COSIC HJ 94 – Lecture 10 Spring 2006
An Example: AES Encryption Processor
• Full Rijndael Encryption
• 0.18 µm CMOS
• 1.8V Core, 3.3V Pads
• 2.29 Gb/s at fclk 125 MHz
• AES Compliant
• 56 mW Core Power
• 173 Kgate
• Loosely Coupled Coprocessor
KUL - COSIC HJ 94 – Lecture 10 Spring 2006
Security is as strong as the weakest link!
Cipher Design,Biometrics
DQ
Vcc
CPUCrypto
MEM
JCAJava
JVM
CLK
Identification
ConfidentialityIntegrity
SIM
DQ
Vcc
CPUMEM
JCAJava
KVM
CLK
Protocol: WLAN protocols, e-ID
Algorithm: Public Key, Secret Key, Hash,Biometrics
Architecture: Co-design, HW/SW, SOCSW (e.g. point counting algorithms)
Circuit: Circuit techniques to combat sidechannel analysis attacks
Micro-Architecture: co-processor design
Identification
ConfidentialityIntegrity
IdentificationIntegrity
SIMSIMSIM
• Skiing down the security mountain
Page 17www.esat.kuleuven.be/cosic
KUL - COSIC HJ 94 – Lecture 10 Spring 2006
Circuit styles
KUL - COSIC HJ 94 – Lecture 10 Spring 2006
Concepts
• Introduction to Dynamic Differential Logic(also called Dual rail with precharge)
• Full custom approach• Standard cell approach• Place & route approach
Page 18www.esat.kuleuven.be/cosic
KUL - COSIC HJ 94 – Lecture 10 Spring 2006
Intro to Static CMOS
• Consumes power when output makes a 0 to 1 transition
0-1 transition
01 1
charge1 0
discharge0 1
00 0
OUTIN
KUL - COSIC HJ 94 – Lecture 10 Spring 2006
Duplicate logic
• As suggested by famous cryptographers . . .
1-0 transition
0-1 transition
000 01 1
dischargecharge0 11 0
chargedischarge1 00 1
001 10 0
OUTOUTININ
Page 19www.esat.kuleuven.be/cosic
KUL - COSIC HJ 94 – Lecture 10 Spring 2006
Dynamic logic
• Dynamic logic breaks input sequence
in
out Pr(echarge)
Ev(aluation)
PDN
0
1
0
1
OUTEV
1
1
1
1
OUTPre
discharge1 1
01 0
discharge0 1
00 0
ChargeIN
KUL - COSIC HJ 94 – Lecture 10 Spring 2006
Transition independent power consumption …
• When logic values are measured by charging anddischarging capacitances, we need to use a fixedamount of energy for every transition
switch once every cycleswitch a constant
load capacitance
…doesn’t create any side channel information
Page 20www.esat.kuleuven.be/cosic
KUL - COSIC HJ 94 – Lecture 10 Spring 2006
Dynamic and Differential logic …• is necessary but not sufficient
ANDNAND
A
B B
clk
clk
A (1,1) input
(0,0) input
→Balance differential output nodes→(Dis)charge all internal nodes
E.g. DCVSLis notsufficient
KUL - COSIC HJ 94 – Lecture 10 Spring 2006
Sense Amplifier Based Logic charges eachcycle a constant load
clk
AND NAND
clk
A
B
A
B
clk
VDD
M1
• Balanced input and output nodes
• All internal nodes connect to an output
Page 21www.esat.kuleuven.be/cosic
KUL - COSIC HJ 94 – Lecture 10 Spring 2006
Sense Amplifier Based Logic
AND NAND
Ctot=19.32fF
AND NAND
Ctot=19.38fF
KUL - COSIC HJ 94 – Lecture 10 Spring 2006
Implementation details
• Same circuit; two implementations.• Difference in logic style:
– static CMOS– SABL
• 0.18µm, 1.8V CMOS technology• 5000 encryptions• Hspice with 10ps simulation step
Page 22www.esat.kuleuven.be/cosic
KUL - COSIC HJ 94 – Lecture 10 Spring 2006
Supply current profile
• irregular ⇒ input dependent
• regular ⇒ inputindependent
[Tiri CHESS2003]
KUL - COSIC HJ 94 – Lecture 10 Spring 2006
Standard building blocks
• false output
• with false inputs
B
A Z
A
B
A
B
Z
Z A
B
A
prch
Z
B Z De-Morgan’s Law
AND-ingwith
precharge signal
12
precharge 1: outputs are 0
precharge 0 - evaluation: 1 output is 1
Page 23www.esat.kuleuven.be/cosic
KUL - COSIC HJ 94 – Lecture 10 Spring 2006
Wave Dynamic Differential Logic
• Restrict library to AND, OR gate– input 0 ⇒ output 0– no precharge operator
AND gate
OR gateprch
precharge inputs
clk
Encryption Module
register
clk eval. prch.
KUL - COSIC HJ 94 – Lecture 10 Spring 2006
• All functions of and2, or2 operator • In addition: inverted input, output signals• XOR2X4: OAI221X2:• Our WDDL library: 128 cells
WDDL library
A
A
B
B
Y
Y
AOI22X1
OAI22X1
INVX4
INVX4
C0
OAI221X1
AOI221X1A0 A1
B0 B1
Y
Y
INVX2
INVX2 A0 A1
B0 B1
C0
Page 24www.esat.kuleuven.be/cosic
KUL - COSIC HJ 94 – Lecture 10 Spring 2006
Experimental results• Measurement results for FPGA test circuit
5 63 421
single ended
WDDLout out
out
5 63 421 5 63 421 5 63 421
single ended
WDDLout outoutout outout
outout
KUL - COSIC HJ 94 – Lecture 10 Spring 2006
Unbalanced capacitive loads
• For constant power consumption:constant load capacitance.
• Match loads at differential outputs.
Page 25www.esat.kuleuven.be/cosic
KUL - COSIC HJ 94 – Lecture 10 Spring 2006
CA = CA’Co,A + Cw,A + Ci,I1 + … Ci,Ik
= Co,A’ + Cw,A’ + Ci,I1’ + … Ci,Ik’Cw,A = Cw,A’
Load capacitance breakdown
Co,A’
Ci,I2’
Co,A
Ci,I2
Ci,I1’
Ci,I1
gate
gate 2
gate 1
Co: intrinsic output capacitance Cw: interconnect capacitance Ci: input capacitance
Cw,A’
Rw,A’ Cw,A
Rw,A • Intrinsic caps.: matched
• Interconnect: dominant (Moore’s law)
• Balancing interconnect: crucial
KUL - COSIC HJ 94 – Lecture 10 Spring 2006
Place & Route approach
• Parallel routes (adjacent tracks, same layer) balance geometric distances, parasitic effects
• Resistance: equal vias, wire segments• Capacitance (to other layers):
ideally same environmentexact if every other layer is a power plane
Metal x Metal y Via xy
Page 26www.esat.kuleuven.be/cosic
KUL - COSIC HJ 94 – Lecture 10 Spring 2006
Differential pair routing
• Available via gridless/shape-based routers. – only few critical signals (e.g. clock)– experiment with 200 pairs:
8 hours CPU, 1000 conflicts, 100 open nets.
• Gridded routers avoid wires in parallel.• We propose “fat”-wire routing.
– Abstract differential pair as one single fat wire.– Route with fat wire; then decompose into pair.
KUL - COSIC HJ 94 – Lecture 10 Spring 2006
Design example
• Two normal wires replace each fat wire.
Page 27www.esat.kuleuven.be/cosic
KUL - COSIC HJ 94 – Lecture 10 Spring 2006
Security partitioning
Thumbpod-II• Processor & co-
processor• Security partitioning
– Secure ASIC– Regular processor
LEON Processor
ASIC NON-DPA
ASIC DPA
LEON Processor
Boot PROM I/F
AMBA Peripheral
Bus
ASIC NON-DPA
Fingerprint
ASIC DPA 32bits Memory Bus
Comparator
LEON Processor
ASIC NON-DPA
ASIC DPA
LEON Processor
AHB/APB Bridge
Boot PROM I/F
Boot ROM
MemoryController
Integer Unit
AMBA Peripheral
Bus
AHB Controller
ASIC NON-DPA
Sensor
RS232
2MB SRAM
UART1
UART2
AES Coprocessor
ASIC DPA 32bits Memory Bus
Comparator
TemplateStorage
D-Cache2KB
I-Cache
-Cache2KBAHB I/F
KUL - COSIC HJ 94 – Lecture 10 Spring 2006
DPA attack set-up
Page 28www.esat.kuleuven.be/cosic
KUL - COSIC HJ 94 – Lecture 10 Spring 2006
WDDL vs. STD CELL: AES Power TracesSTD CELL WDDL
Encryptionstartpulse
Power supply current
Standard cells WDDL
KUL - COSIC HJ 94 – Lecture 10 Spring 2006
DPA attack on AES key bytes- SCMOS
Page 29www.esat.kuleuven.be/cosic
KUL - COSIC HJ 94 – Lecture 10 Spring 2006
DPA attack on WDDL
KUL - COSIC HJ 94 – Lecture 10 Spring 2006
Skiing down a mountain
Specification
ASIC SpecialPurpose
Retargetablecoprocessor
DSPprocessor
DSP-RISC RISC
Algorithm Transformations
Memory Transformations and Optimizations
Floating-point to Fixed-point
SPW, Matlab, C++
pipelining, unrolling
loop merging, compaction
40 bit accumulator
Page 30www.esat.kuleuven.be/cosic
KUL - COSIC HJ 94 – Lecture 10 Spring 2006
Conclusion
• Design of heterogeneous integrated systems-on-chip• Top down process• Many design options
• Bridge the gap between specifications and implementation, KEY competitive advantage!!
THANKS!!