Russell Tessier1, Jeremie Crenne2, Romain Vaslin2, Guy Gogniat2, Jean-Philippe Diguet2, and Deepak Unnikrishnan1
University of Massachusetts, Amherst1
European University of Brittany2
Hardware Core for Off-chip Memory Security
Management in Embedded Systems
2
A need for security Increase of personal mobile
devices (cell phone, mp3 player, gps)
Digital convergence: Several mobile devices in one:
Security concerns: Intellectual property protection Personal information
NEED FOR PROTECTION AND PRIVACY
Introduction• A need for security• Embedded systems & attacks• Threat model• State of the art• Contributions
3
Example of embedded system architecture:
Threats: Virus/Worms Reverse engineering Fault injection Memory modification Bus modification Side channel Bus probing
Power supply
CHIP 2
CHIP 1GPP DSPcoprocessor
coprocessor
Hardwareaccelerator
Communication interface
Externalmemory
Monitors
GPPLocal
memory
Comm stack
Localmemory
Communication network
Sharedmemory
Hardwareaccelerator Local
memory
OS OS2
BOARD External communication interface
Comm
OSOS
Embedded Systems & potential attacks
AES
RAM
RAMKEY
RSA
µP
turbo code
• A need for security• Embedded systems & attacks• Threat model• State of the art• Contributions
Introduction
4
SECURE ZONE UNTRUSTED ZONE
Inst
cac
he
Processorcore
Externalmemory
Bus control
Inst
cac
he
Data bus Data bus
Dat
a ca
che
OS
OS code
OS data
Data
The challenge of memory protection & Threat Model External bus access leads to:
Code extraction\modification Private data extraction\modification
Threat model: A secure zone Any possible modification and observation on the address and data
buses
Targeted attacks: Spoofing Relocation Replay
SECURE ZONE UNTRUSTED ZONE
Dat
a ca
che
Inst
ca
che
Processorcore
Externalmemory
Read @3
0xFFAD9024 @00xAD779056 @1
0x00000045 @2
0x0FAE87C4 @30xFFFFFFFF
SECURE ZONE UNTRUSTED ZONE
Dat
a ca
che
Inst
ca
che
Processorcore
Externalmemory
Read @3
0xFFAD9024 @00xAD779056 @1
0x00000045 @2
0x0FAE87C4 @30xFFAD9024
SECURE ZONE UNTRUSTED ZONE
Dat
a ca
che
Inst
ca
che
Processorcore
Externalmemory
Read @3
0xFFAD9024 @00xAD779056 @1
0x00000045 @2
0x0FAE87C4 @30xDA0067C4
T=550
T=150
Introduction
• A need for security• Embedded systems & attacks• Threat model• State of the art• Contributions
5
State of the art Existing solutions relying on the same threat model:
AEGIS (MIT): One-time-pad / Cached hash tree (OS controlled) XOM (Stanford): One-time-pad / MD5 (OS controlled) PE-ICE (LIRMM): AES / Tag comparison TEC-Tree (Princeton\LIRMM): PE-ICE / hash tree
Issues: High memory overhead (>50%) Software execution performance loss (>50%) Area overhead (several AES cores, MD5 or SHA-1 cores)
TRUSTED ZONE UNTRUSTED ZONE
HardwareSecurity
Core
Dat
a ca
che
Inst
ruct
ion
cach
e
processor DDR SRAMmemory
Address
DD
R IP
con
trol
ler
Data
Control
Introduction
• A need for security• Embedded systems & attacks• Threat model• State of the art• Contributions
6
Contributions Solution fitting with embedded systems resources:
Logic size Memory footprint (including security data) Power consumption Performance
Flexible solution for the software designer: Flexible architecture Flexible security policy
End to end solution: Secure system boot up Application update Security update
FPGA
Ethernet
Flashmemory
RAMmemory
SW
Security core
Processor core
Encr
ypte
d ap
plic
atio
ns
Encr
ypte
d Co
de &
dat
a
Download
Introduction
• A need for security• Embedded systems & attacks• Threat model• State of the art• Contributions
01101 1010101000 1011110001 1100110110 1100110111 1001100001 0010010101 0100010101 0011011001 0111011001 1001110101 0110010111 1010100110 0101100101 0011101000 1000100111 0100100101 1010100111 00000
7
1. How to guarantee confidentiality & integrity?2. Hardware security management3. Evaluation of the security cost4. End to end solution5. Conclusion & perspectives
1 – Confidentiality & integrity scheme2 – Hardware security management3 – Security cost4 – End to end solution
01101 1010101000 1011110001 1100110110 1100110111 1001100001 0010010101 0100010101 0011011001 0111011001 1001110101 0110010111 1010100110 0101100101 0011101000 1000100111 0100100101 1010100111 00000
8
1. How to guarantee confidentiality & integrity?2. Hardware security management3. Evaluation of the security cost4. End to end solution5. Conclusion & perspectives
1 – Confidentiality & integrity scheme2 – Hardware security management3 – Security cost4 – End to end solution
9
Common security tools AES based:
Add latency (~10 cycles/AES computation)
Critical data path latency (70 cycles for a read) Processor based architecture
Hash algorithm based: Add latency (60, 80 cycles/
hash computation)
Externalmemory
TRUSTED ZONE UNTRUSTED ZONE
@ ofCache line
Processorcore
Inst
ruct
ion
cach
eD
ata
cach
e
AES
inpu
t
AES
outp
ut
AES coredeciphering
Clea
r ca
che
line
Ciph
ered
cac
he li
ne
Read request of a cache line
Hashcore
Has
h ou
tput
Has
h in
put
AES key
= ?
Externalmemory
TRUSTED ZONE UNTRUSTED ZONE
@ ofCache line
Processorcore
Inst
ruct
ion
cach
eD
ata
cach
e
Clea
r ca
che
line
Ciph
ered
cac
he li
ne
Write request of a cache line
Hashcore
Has
h in
put
Has
h ou
tput
AES
inpu
t
AES
outp
ut
AES coreciphering
AES key
1 – Confidentiality & integrity scheme2 – Hardware security management3 – Security cost4 – End to end solution
• Common security tools• AES-CTR mode• Fast integrity checking with AES-GCM• Confidentiality & integrity in action• Comparison with previous work
10
AES-CTR: An efficient confidentiality scheme AES in Counter mode of operation (AES-CTR) AES input composed of:
Time stamp/counter (replay) Data address (relocation) Initialization vector
Deciphering latency gain
AES key
AES
inpu
t
AES
outp
ut
AES core128 bits
Plaintext
Keys
trea
m
Ciph
erte
xt
TS@
IV Plai
ntex
t
Ciphertext
Data fetchingAES deciphering
Sending data to core
Data fetching
Keystream generation (AES)
Sending data to core
xor Latency gain
(a)
(b)
CipheringDeciphering
1 – Confidentiality & integrity scheme2 – Hardware security management3 – Security cost4 – End to end solution
• Common security tools• AES-CTR mode• Fast integrity checking with AES-GCM• Confidentiality & integrity in action• Comparison with previous work
11
AES-GCM: A counter based mode with a low latency integrity checking AES-GCM
is NIST standardized Relies on 128-bit AES and can be parallelizable and pipelined Provide fast integrity checking
Integrity operations rely on Galois Field operation Multiplication on GF(2128) can be done in 1 cycle ! but has to be
carefully designed to avoid huge logic overhead An 128-bit data integrity check can be done in 3 additional cycles !
Data fetchingAES deciphering
Sending data to core
Data fetching
Keystream generation (AES)
Sending data to core
xorLatency gain
(a)
(b)ic
1 – Confidentiality & integrity scheme2 – Hardware security management3 – Security cost4 – End to end solution
• Common security tools• AES-CTR mode• Fast Integrity Checking with AES-GCM• Confidentiality & integrity in action• Comparison with previous work
12
AES-GCM: A counter based mode with a low latency integrity checking
1 – Confidentiality & integrity scheme2 – Hardware security management3 – Security cost4 – End to end solution
• Common security tools• AES-CTR mode• Fast Integrity Checking with AES-GCM• Confidentiality & integrity in action• Comparison with previous work
ENCRYPTION &
DECRYPTION CIRCUITRY
AUTHENTICATION CIRCUITRY
IV96 || @32 || TS32
128-bit AES
Plaintext 1
Ciphertext 1
MultH
TS+1
IV96 || @32 || (TS+1)32
128-bit AES
Plaintext 2
Ciphertext 2
MultH
MultH
Tag
064 || Len(C)64
128 bit
128 bit
128 bit
128 bit
128 bit
128 bit
128 bit
128 bit
128 bit
128 bit
128 bit
128 bit
128 bit
128 bit
128 bit128 bit
128 bit
Hardware Security Core
Dat
a ca
che
Inst
ruct
ion
cach
e
Processor
@
Out
put
cach
e lin
e
IC Tagmemory
= ?
bypasscore control
Timestampgenerator
Timestampmemory
AES
inpu
t
AES
outp
ut
AES-GCM
AES Key (UKey)In
put
cach
e lin
e
Trusted Area Untrusted Area
Externalmemory
Sending ciphered dataxor
TS+
ICmTSm
Cache data fetching
Keystream generation (AES) Operations scheduling
Typical use with a processor architecture – Write request
1 – Confidentiality & integrity scheme2 – Hardware security management3 – Security cost4 – End to end solution
• Common security tools• AES-CTR mode• Fast integrity checking with AES-GCM• Confidentiality & integrity in action• Comparison with previous work
tagkeystream
ICG
IV
14
Typical use with a processor architecture – Read request
1 – Confidentiality & integrity scheme2 – Hardware security management3 – Security cost4 – End to end solution
• Common security tools• AES-CTR mode• Fast integrity checking with AES-GCM• Confidentiality & integrity in action• Comparison with previous work
Ciphered data fetching
xor ICG
ICmTSm
Sending data to cache
Keystream generation (AES) Operations scheduling
Hardware Security Core
Dat
a ca
che
Inst
ruct
ion
cach
e
Processor
@
Inpu
t ca
che
line
Out
put
cach
e lin
e
IC Tagmemory
= ?
bypasscore control
IVTimestampgenerator
Timestampmemory
AES
inpu
t
AES
outp
ut
AES-GCM
AES Key (UKey)
= ?
valid
Trusted Area Untrusted Area
Externalmemory
tagkeystream
15
Comparison with state of the art Memory footprint for 256kB of data & 256 kB of code:
Approaches overview:
AES-GCM PE-ICE TEC-Tree XOM AEGIS
Memory 30.4% 54.7% 76.2% 56% 94%
Perf. loss 15% 34% N/A 63% N/A
Tag size 32 bits 32 bits 64 bits 128 bits 160 bits
Security 1/232 1/232 1/264 1/2128 1/2160
Off-chip - data On-chip - dataOff-chip - code On-chip - code
32 160
276
AEGIS / 468 kB32
128128
XOM / 288 kB128
262
TEC-Tree / 390 kB24
128128
PE-ICE / 280 kB
64
96
AES-GCM / 160 kB
DES
ADPCMDhrystone
Object tracking
20
30
40
50
60
70
80
90% 95
84
7850
8151
8766
43
3128
67
AES-
GCM
PE-I
CEXO
M (M
D5)
1 – Confidentiality & integrity scheme2 – Hardware security management3 – Security cost4 – End to end solution
• Common security tools• AES-CTR mode• Fast integrity checking with AES-GCM• Confidentiality & integrity in action• Comparison with previous work
[1] AES-GCM produces 128-bit IC Tag for a 128-bit word. We only keep the 32 MSBs to avoid the memory penalty. The security level can be increased to 1/2128 .
[1]
01101 1010101000 1011110001 1100110110 1100110111 1001100001 0010010101 0100010101 0011011001 0111011001 1001110101 0110010111 1010100110 0101100101 0011101000 1000100111 0100100101 1010100111 00000
16
1. How to guarantee confidentiality & integrity ?
2. Hardware security management3. Evaluation of the security cost4. End to end solution5. Conclusion & perspectives
1 – Confidentiality & integrity scheme2 – Hardware security management3 – Security cost4 – End to end solution
17
A need for architecture and security flexibility Cost of security is high (area, performance, memory)
Requires resource usage within the FPGA Memory (between 30 & 50% overhead) Software execution performance (between 15 & 30% overhead)
New solutions to save resources: Hardware? Software? Hardware & software?
Offer more control on security policy to the designer
1 – Confidentiality & integrity scheme2 – Hardware security management3 – Security cost4 – End to end solution
• Architecture & security flexibility• Security memory mapping• SMM construction example• Integration of SMM• Architecture detailed view
18
Security memory mapping Security management based on memory
mapping of the code & data Adapted for application running with an
Operating System
Advantages: Reduction of security memory overhead Reduction of software execution losses Reduction of power consumption due to security
Task 1 code
Task 2 code
Task n code
OS code
R/W data
OS data
Task 1 stack
Task 2 stack
Task n stack
Non protected
Confidentiality only
Confidentiality / Integrity
Uniform protection
CodeD
ata1 – Confidentiality & integrity scheme2 – Hardware security management3 – Security cost4 – End to end solution
• Architecture & security flexibility• Security memory mapping• SMM construction example• Integration of SMM• Architecture detailed view
19
SMM construction0x8000020 <alt_exception>:
8000020: addi sp,sp,-768000024: stw ra,0(sp)...
0x80001d0 <task1>:80001d0: call 800eff8 <OSFlagPend>80001d4: call <alt_timestamp_start> 80001d8: cmpge r2,r2,zero...
0x80002e8 <task2> 80002e8: addi sp,sp,-2080002ec: stw ra,16(sp)80002f0: stw fp,12(sp) ...
0x8000424 <task3>:8000424: call 800eff8 <OSFlagPend>8000428: movhi r4,2049800042c: addi r4,r4,17116 ...
0x80006ac <task4>:80006ac: stb r2,9(fp)80006b0: ldbu r2,9(fp)80006b4: cmpgeui r2,r2,119 ...
Segment 0: Base @: 0x8000020 Size: 1028 bytes Confidentiality & integrity Code
Segment 1: Base @: 0x8000424 Size: 680 bytes Confidentiality only Code
Segment 2: Base @: 0x80006ac Size: 2048 bytes Confidentiality & integrity Code
1 – Confidentiality & integrity scheme2 – Hardware security management3 – Security cost4 – End to end solution
• Architecture & security flexibility• Security memory mapping• SMM construction example• Integration of SMM• Architecture detailed view
20
Secure architecture with SMM Security Memory Mapping
Not dedicated to a given security mode (AES-GCM) Fully done in hardware, no OS modification
Hardware Security Core
Dat
a ca
che
Inst
ruct
ion
cach
e
Processor Externalmemory
Address
Data
Control
SMM
Ciphering/Hashing core
@Base @
SizeSecurity
levelCode/data
Segment nBase @
SizeSecurity
levelCode/data
Segment 1
Security levelCode/data
1 – Confidentiality & integrity scheme2 – Hardware security management3 – Security cost4 – End to end solution
• Architecture & security flexibility• Security memory mapping• SMM construction example• Integration of SMM• Architecture detailed view
Trusted Area Untrusted Area
21
Hardware security core with SMM
1 – Confidentiality & integrity scheme2 – Hardware security management3 – Security cost4 – End to end solution
• Architecture & security flexibility• Security memory mapping• SMM construction example• Integration of SMM• Architecture detailed view
Hardware Security Core
Dat
a ca
che
Inst
ruct
ion
cach
e
Processor
@
Inpu
t ca
che
line
Out
put
cach
e lin
e
IC Tagmemory
= ?
bypass
core control
core controlSecurityMemory
Map
Segment ID
Timestampgenerator
Timestampmemory
AES
inpu
t
AES
outp
ut
AES-GCM
AES Key (UKey)
tag= ?
valid
Trusted Area Untrusted Area
keystreamExternalmemory
01101 1010101000 1011110001 1100110110 1100110111 1001100001 0010010101 0100010101 0011011001 0111011001 1001110101 0110010111 1010100110 0101100101 0011101000 1000100111 0100100101 1010100111 00000
22
1. How to guarantee confidentiality & integrity?2. Hardware security management3. Evaluation of Security Cost 4. End to end solution5. Conclusion & perspectives
1 – Confidentiality & integrity scheme2 – Hardware security management3 – Security cost4 – End to end solution
23
Experimental approach Architecture overview:
Microblaze 7.00 High resolution timer Flash bridge DDR sdram bridge JTAG
4 applications running with MicroC/OS-II: Image processing (morphological image processing) Video On Demand (RS, AES, MPEG-2) Communication (RSd, AES, RSc) Multi hash (MD5, SHA-1, SHA-2 )
TRUSTED ZONE UNTRUSTED ZONE
HardwareSecurity
Core
Data
ca
che
Inst
ruct
ion
cach
e
Microblazeprocessor
DDR SRAMmemory
Address
PLB
bus a
rbite
r 1
PLB
bus a
rbite
r 2
Mas
ter
Mas
ter Sl
ave
Mas
ter
DD
R IP
con
trol
erSl
ave
Data
Control
• Experimental approach• Applications security policy• Experimental results• A Trade-off for benefits
1 – Confidentiality & integrity scheme2 – Hardware security management3 – Security cost4 – End to end solution
24
Applications security policy Image processing:
Only algorithm core code & data protected (CI) Video-On-Demand:
MPEG decoder code must not be stolen (CO) Image must not be stolen (CO) AES sensitive data must be protected (CI)
Communication: Processed data must not be stolen (CO) Code must not be attacked (CI)
Hash: Code must not be stolen (CO) Processed data can be stolen
Programmable protection≈
Uniform protection
Programmable protection!=
Uniform protection
OS protected
Code Data
4060
73
27
55
4526
74
100 100
100 100
% Protected % Not protected
1 – Confidentiality & integrity scheme2 – Hardware security management3 – Security cost4 – End to end solution
• Experimental approach• Applications security policy• Experimental results• A Trade-off for benefits
25
Logic area overhead Uniform protection:
CI or CO for the whole memory Programmable protection:
Policy decided by the software designer
Base Microblaze architecture: ~3335 LUTs
2543587825765911Hash3510684534706805Comm.3599693436196954VOD3627696234856820Image
HSCµB + HSCHSCµB + HSC
Programmable protectionUniform protectionapplicatio
n
~ +104 % ~ +107 %~ +77 % ~ +76 %
1 – Confidentiality & integrity scheme2 – Hardware security management3 – Security cost4 – End to end solution
• Experimental approach• Applications security policy• Experimental results• A Trade-off for benefits
[2] All results target a Spartan-6 device SP605 (XC6SLX45T). The base configuration uses a Microblaze with 2KB D/I caches and operated at 86 MHz.
[2]
26
Software performances losses compared with non protected approach
Performance loss is security policy dependent
-14.4%8.6-15.9%8.77.5Hash 2k-8.6%65.4-10.8%66.760.2Comm 2k
-12.7%13453.5
-15.2%13751.2
11940.3VOD 2k-11.9%146.9-19.5%156.9131.3Image 2k
(ms)(ms)(ms)
Programmable Protection
UniformProtection
No Protection
Software performance losses
-15.35 % -11.9 %
1 – Confidentiality & integrity scheme2 – Hardware security management3 – Security cost4 – End to end solution
• Experimental approach• Applications security policy• Experimental results• A Trade-off for benefits
27
Memory overhead is fully dependant of the designer choice for security policy
42.2
20
199.6
48.8
6.8
33.2
kbytes
43.2
Security memory footprint
20
14.8
6.38.3
6.5
17
17.87
17.8
8.5
6.8
20
40
60
80
100
120
140
160
180
0UP PP UP PP UP PP UP PP
Image VOD Comm Hash
53.8
107.8
5.4 38
14.1
28.3
8.5
IC tag code
TS data
IC tag data
7.4 52%
75%
23%
100%
1 – Confidentiality & integrity scheme2 – Hardware security management3 – Security cost4 – End to end solution
• Experimental approach• Applications security policy• Experimental results• A Trade-off for benefits
28
A trade-off between security & resources & performance Benefit of our complete approach
Increase software performance versus uniform (~ +3%)
Reduce the memory security footprint (~ -50%)
Increase security flexibility for the designer
Increase logic size (~ +3%)
Values depending on security policy & designer wishes
1 – Confidentiality & integrity scheme2 – Hardware security management3 – Security cost4 – End to end solution
• Experimental approach• Applications security policy• Experimental results• A Trade-off for benefits
01101 1010101000 1011110001 1100110110 1100110111 1001100001 0010010101 0100010101 0011011001 0111011001 1001110101 0110010111 1010100110 0101100101 0011101000 1000100111 0100100101 1010100111 00000
29
1. How to guarantee confidentiality & integrity?2. Hardware security management3. Evaluation of the security cost4. End to end solution5. Conclusion & perspectives
1 – Confidentiality & integrity scheme2 – Hardware security management3 – Security cost4 – End to end solution
30
Lightweight boot approach Infrequent task
Challenges of secure boot: Secure FPGA configuration Secure code loading into RAM from Flash
Again, cost-conscious: Low logic boot scheme Low power of boot logic during execution
Issues to tackle: Efficient & secure data Flash loading in RAM memory Initialization of ciphered data in RAM, IC Tag & TS in on-chip
memory
FPGA
Ethernet
Flashmemory
RAMmemory
SW
Security core
Processor core
Encr
ypte
d ap
plic
atio
ns
Encr
ypte
d Co
de &
dat
a
Download
1 – Confidentiality & integrity scheme2 – Hardware security management3 – Security cost4 – End to end solution
• Context of secure boot• System boot up case study• Experimental results
31
Boot execution scheme
Boot done in 2 steps: FPGA secure configuration Application secure loading from Flash to RAM memory
FPGAFlash
memoryRAM
memory
SW
Applicationcode
Applicationcode
AES with ExecGCMAES with LoadGCM policy
Hardware Security Core
Processor core
CHC
Init. VectorTimestamp
SMMAppli config
Secu
re F
PGA
confi
gura
tion
Bitstream
Microblaze
SMM
CHC
FPGA supplier data protection
IC Tag memory
LoadGCM
ExecGCM
1 – Confidentiality & integrity scheme2 – Hardware security management3 – Security cost4 – End to end solution
• Context of secure boot•System boot up case study• Experimental results
AES-GCM Results Application secure boot time:
Very low memory overhead 32-bit counter value 96-bit initialization vector 128-bit IC tag
Non protected boot time: less than 5 ms Extra boot time due to AES-GCM: ~500µs
Boot time is a small part of the system lifetime
20
100
40
60
80
100
106
%120
104
Trend of boot time depending on security policy
NP
AES-
GCM
UP
AES-
GCM
PP
1 – Confidentiality & integrity scheme2 – Hardware security management3 – Security cost4 – End to end solution
• Context of secure boot• System boot up case study• Experimental results
33
1. How to guarantee confidentiality & integrity?2. Hardware security management3. Evaluation of the security cost4. End to end solution5. Conclusion & perspectives
Conclusion & perspectives
34
Conclusion A cost-conscious approach fitting with embedded
systems resources: Low cost security
A full evaluation of the security cost: Area (~75/110%), memory (~20/30%) Cost flexibility (~3%)
End to end solution: Secure boot up Multi-application support & architecture multi-configuration:
Configurable SMM at boot up Boot loader
Conclusion & perspectives• Major contributions• Perspectives
35
Perspectives CAD tool:
Security policy & resources exploration
Extended threat model: Evaluation of the cost for DPA, fault injection, … Behavior guessing protection
Explore the FPGA reconfigurable capabilities: Weakness of reconfiguration ? (a new pass for potential threats) Strength of reconfiguration ? (dynamic behavior)
Emerging technology, a new challenge: Multi-processor architecture Multi-OS architecture, OS virtualization
Conclusion & perspectives• Major contributions• Perspectives
Top Related