EM Analysis of ECC Computations on Mobile Devicescgebotys/NEW/Simon_thesis.pdf · Many secured...
Transcript of EM Analysis of ECC Computations on Mobile Devicescgebotys/NEW/Simon_thesis.pdf · Many secured...
ii
EM Analysis of ECC Computations
on Mobile Devices
by
Simon C. K. Ho
A thesis
presented to the University of Waterloo
in fulfillment of the
thesis requirement for the degree of
Master of Applied Science
In
Electrical and Computer Engineering
Waterloo, Ontario, Canada, 2005
© Simon C. K. Ho 2005
ii
I hereby declare that I am the sole author of this thesis.
I authorize the University of Waterloo to lend this thesis to other institutions or individuals for the
purpose of scholarly research.
I further authorize the University of Waterloo to reproduce this thesis by photocopying or by
other means, in total or in part, at the request of other institutions or individuals for the purpose of
scholarly research.
iii
The University of Waterloo requires the signatures of all persons using or photocopying
this thesis. Please sign below, and state an address and date.
iv
Abstract
Internet-enabled mobile devices, such as PDAs and mobile phones, open the door to a
slew of new commercial applications and services. However these devices also impose a
unique set of security challenges due to their mobility. In particular, they may be
vulnerable to a type of side channel attack known as EM analysis, which analyzes the
correlation between the leaked EM emanations and the secrets in the cryptographic
computations. Understanding the threats of EM analysis is vital to evaluating the security
of these PDA devices.
Many secured applications use public key cryptography for providing
authentication, integrity, non-repudiation and encryption. ECC (Elliptic Curve
Cryptography) is a particularly suitable public-key system for mobile devices as it is
more efficient than other common cryptographic systems.
This thesis explores the vulnerabilities of ECC computations on mobile devices to
EM analysis: SEMA (Simple EM Analysis) and DEMA (Differential EM Analysis).
New analysis techniques and attack methodologies pertinent to ECC and PDA are
proposed. It is found that the use of AI neural network programming and integer
optimization models can improve the power of SEMA. In DEMA the choice of partition
bit and reference signal can increase its lethalness.
Furthermore, the use of power spectrum density and spectrogram for EM analysis
are proposed and examined. It is found the spectrogram signals are more effective than
time domain and power spectrum density signals for both SEMA and DEMA.
v
Acknowledgements
I would like to thank my supervisor, Professor Cathy Gebotys, for all her advice,
guidance and encouragement. I would also like to thank my parents and friends for their
support.
I greatly appreciate the generous financial support provided by Professor Gebotys
through a Research Assistantship. I am also grateful for the scholarships awarded to me
by the Department of Electrical and Computer Engineering at the University of Waterloo.
vi
Tables of Contents ABSTRACT ................................................................................................................................................ IV LIST OF TABLES...................................................................................................................................VIII LIST OF FIGURES.................................................................................................................................... IX LIST OF ALGORITHMS ......................................................................................................................... XI 1 INTRODUCTION............................................................................................................................... 1
1.1 RESEARCH MOTIVATION.............................................................................................................. 2 1.2 THESIS OBJECTIVE ....................................................................................................................... 3 1.3 THESIS OVERVIEW ....................................................................................................................... 3
2 INTRODUCTION TO EM SIGNAL CAPTURE AND ANALYSIS ............................................. 5 2.1 ORIGIN AND TYPES OF EM SIGNALS ............................................................................................ 5 2.2 CAPTURE OF EM SIGNALS............................................................................................................ 7 2.3 DIRECT EMANATION .................................................................................................................... 8 2.4 UNINTENDED EMANATION ........................................................................................................... 8 2.5 BENEFITS OF EM ANALYSIS OF PDA ......................................................................................... 10 2.6 SPECTROGRAM........................................................................................................................... 10
3 INTRODUCTION TO SIDE CHANNEL ATTACKS................................................................... 12 3.1 TIMING ANALYSIS...................................................................................................................... 12 3.2 FAULT ANALYSIS ....................................................................................................................... 13 3.3 SIMPLE ANALYSIS ON POWER/EM SIGNALS............................................................................... 14 3.4 DIFFERENTIAL ANALYSIS ON POWER/EM SIGNALS ................................................................... 14 3.5 TEMPLATE ATTACK ON POWER/EM SIGNALS ............................................................................ 16
4 INTRODUCTION TO ELLIPTIC CURVE CRYPTOGRAPHY................................................ 18 4.1 MATHEMATICAL OVERVIEW ...................................................................................................... 18 4.2 BENEFITS FOR PDA IMPLEMENTATION ...................................................................................... 19 4.3 IMPLEMENTATION ...................................................................................................................... 20 4.4 COUNTERMEASURES TO THWART SIDE CHANNEL ATTACKS...................................................... 21
5 PROPOSED METHODOLOGY OF DEMA ................................................................................. 25 5.1 PROPOSED TRACE SPLITTING STRATEGY ................................................................................... 26 5.2 PROPOSED DIFFERENTIAL ANALYSIS OF TRACES ....................................................................... 30 5.3 PROPOSED DIFFERENTIAL ANALYSIS IN FREQUENCY AND SPECTROGRAM ................................ 34 5.4 ATTACK STRATEGY ON KNOWN POINT OPERATION................................................................... 36 5.5 PROPOSED ATTACK STRATEGY ON UNKNOWN POINT OPERATION............................................. 36 5.6 PROPOSED ATTACK STRATEGY ON WINDOW METHOD .............................................................. 37
6 PROPOSED METHODOLOGY OF SEMA.................................................................................. 39 6.1 MOTIVATION OF USING NEURAL NETWORK ............................................................................... 39 6.2 NEURAL NETWORK STRUCTURE ................................................................................................ 42 6.3 PREPROCESSING ......................................................................................................................... 44 6.4 TRAINING................................................................................................................................... 47 6.5 CLASSIFICATION ........................................................................................................................ 48 6.6 COMBINATION OF CLASSIFICATION RESULTS............................................................................. 48 6.7 INTEGER OPTIMIZATION MODEL ................................................................................................ 50
7 EXPERIMENTAL SETUP AND METHODOLOGY................................................................... 52 7.1 TARGET HARDWARE PLATFORM................................................................................................ 52 7.2 TARGET SOFTWARE PLATFORM ................................................................................................. 52
vii
7.3 ECC PROGRAM IMPLEMENTATION............................................................................................. 53 7.4 MEASUREMENT SETUP AND TECHNIQUE.................................................................................... 56 7.5 OSCILLOSCOPE CONFIGURATION ............................................................................................... 59
8 EXPERIMENTAL RESULTS OF DEMA ..................................................................................... 61 8.1 SETUP......................................................................................................................................... 61 8.2 RESULTS OF TRACE SPLITTING................................................................................................... 62 8.3 RESULTS OF TIME DOMAIN ANALYSIS ....................................................................................... 65 8.4 RESULTS OF POWER SPECTRUM DENSITY ANALYSIS ................................................................. 66 8.5 RESULTS OF SPECTROGRAM ANALYSIS...................................................................................... 69 8.6 COMPARISONS............................................................................................................................ 71
9 EXPERIMENTAL RESULTS OF SEMA...................................................................................... 73 9.1 SETUP......................................................................................................................................... 73 9.2 PARAMETERS OF NEURAL NETWORK ......................................................................................... 76 9.3 RESULTS OF NEURAL NETWORK USING TIME DOMAIN SIGNALS ................................................ 77 9.4 RESULTS OF NEURAL NETWORK USING SPECTROGRAM SIGNALS .............................................. 81 9.5 RESULTS OF TEMPLATE ATTACK................................................................................................ 86 9.6 RESULTS OF AVERAGING AND INTEGER OPTIMIZATION MODEL ................................................ 89 9.7 COMPARISONS............................................................................................................................ 91
10 DISCUSSION AND CONCLUSIONS ............................................................................................ 92 10.1 LIMITATION OF RESEARCH AND IMPLEMENTATION.................................................................... 92 10.2 SUMMARY.................................................................................................................................. 93 10.3 COUNTERMEASURES .................................................................................................................. 95 10.4 FUTURE WORK........................................................................................................................... 96
APPENDIX A –JAVA API FOR 192-BIT PRIME FIELD (PF192)...................................................... 98 APPENDIX B – JAVA API FOR 192-BIT ECC (ECC_P192) ............................................................... 99 APPENDIX C – GAMS MODEL FOR INTEGER OPTIMIZATION ............................................... 101 APPENDIX D – MATLAB CODE FOR NEURAL NETWORKS ...................................................... 102 BIBLIOGRAPHY .................................................................................................................................... 104
viii
List of Tables
TABLE 6-1: ERROR CORRECTION WITH INTEGER OPTIMIZATION MODEL ............................... 51 TABLE 7-1: ELLIPTIC CURVE P-192 PARAMETERS [NIST]............................................................... 55 TABLE 7-2: COSTS OF POINT OPERATIONS [BHL00] ........................................................................ 56 TABLE 8-1: GREATEST MULTIPLES OF SD_DOM BELOW PEAK AMPLITUDE............................ 71 TABLE 9-1: CPU TIME AND ERROR % FOR DIFFERENT NN TRAINING ALGORITHMS ............. 81 TABLE 9-2: OPTIMAL PARAMETER VALUES OF A NN USING TIME DOMAIN SIGNALS.......... 81 TABLE 9-3: CPU TIME AND ERROR % FOR DIFFERENT NN TRAINING ALGORITHMS ............. 86 TABLE 9-4: OPTIMAL PARAMETER VALUES OF A NN USING SPECTROGRAM SIGNALS........ 86 TABLE 9-5: % ERROR RATE OF DIFFERENT ALGORITHMS ............................................................ 89
ix
List of Figures
FIGURE 2-1: CMOS GATE .......................................................................................................................... 5 FIGURE 2-2: 3D DIAGRAM OF EM EMANATION [QS01]...................................................................... 6 FIGURE 2-3. NEAR FIELD PROBE [GMO01]............................................................................................ 8 FIGURE 2-4. SPECTROGRAM.................................................................................................................. 11 FIGURE 5-1: MSB PARTITIONING.......................................................................................................... 29 FIGURE 5-2: CONSTANT REFERENCE SIGNAL................................................................................... 30 FIGURE 5-3: SD-DOM REFERENCE SIGNAL ........................................................................................ 32 FIGURE 5-4: IDEAL DIFFERENTIAL SIGNAL....................................................................................... 33 FIGURE 5-5: ACTUAL DIFFERENTIAL SIGNAL .................................................................................. 33 FIGURE 5-6: RELATIONSHIP BETWEEN DIFFERENT SIGNALS....................................................... 35 FIGURE 5-7: ATTACK STRATEGY ON UNKNOWN POINT OPERATION ........................................ 37 FIGURE 5-8: DIFFERENTIAL SIGNAL FOR DIFFERENT KEY BITS.................................................. 38 FIGURE 6-1: SIGNAL COMPONENT OF OPERATION A TEMPLATE................................................ 40 FIGURE 6-2: SIGNAL COMPONENT OF OPERATION B TEMPLATE................................................ 40 FIGURE 6-3: NEURAL NETWORK STRUCTURE [S96] ........................................................................ 42 FIGURE 6-4: TAN-SIGMOID TRANSFER FUNCTION [MK] ................................................................ 43 FIGURE 6-5: SIGNAL ENVELOPE ........................................................................................................... 44 FIGURE 6-6: GRADIENT DECENT ALGORITHM [ANC] ..................................................................... 47 FIGURE 6-7: EFFECT OF COMBINATION OF CLASSIFICATION RESULTS .................................... 50 FIGURE 7-1: JAVA RUNTIME ENVIRONMENT.................................................................................... 53 FIGURE 7-2: MEASUREMENT SETUP.................................................................................................... 57 FIGURE 7-3: EM PROBE ........................................................................................................................... 58 FIGURE 8-1: DIFFERENTIAL SIGNAL FOR CORRECT BIT PARTITIONING ON 2ND MSB ............ 63 FIGURE 8-2: DIFFERENTIAL EM SIGNAL FOR CORRECT PARTITIONING ON 3RD MSB............. 63 FIGURE 8-3: DIFFERENTIAL EM SPECTRO FOR CORRECT PARTITIONING ON 2ND MSB.......... 64 FIGURE 8-4: DIFFERENTIAL EM SPECTRO FOR CORRECT PARTITIONING ON 3RD MSB .......... 64 FIGURE 8-5: DIFFERENTIAL EM SIGNAL OF ECC DOUBLE WITH CORRECT GUESS................. 65 FIGURE 8-6: DIFFERENTIAL EM SIGNAL OF ECC DOUBLE WITH INCORRECT GUESS ............ 66 FIGURE 8-7: DIFFERENTIAL EM PSD OF ECC DOUBLE WITH CORRECT GUESS........................ 67 FIGURE 8-8: DIFFERENTIAL EM PSD OF ECC DOUBLE WITH INCORRECT GUESS.................... 67 FIGURE 8-9: DIFFERENTIAL PSD OF ECC DOUBLE WITH CORRECT GUESS............................... 68 FIGURE 8-10: DIFFERENTIAL PSD OF ECC DOUBLE WITH INCORRECT GUESS......................... 68 FIGURE 8-11: DIFFERENTIAL EM SPECTRO OF ECC DOUBLE WITH CORRECT GUESS............ 69
x
FIGURE 8-12: DIFF EM SPECTRO OF ECC DOUBLE WITH INCORRECT GUESS........................... 70 FIGURE 8-13: A FRAME OF DIFFERENTIAL EM SPECTRO WITH CORRECT GUESS................... 70 FIGURE 8-14: A FRAME OF DIFFERENTIAL EM SPECTRO WITH INCORRECT GUESS............... 71 FIGURE 9-1: EM SIGNAL FROM ECC DOUBLE OPERATION #1 ....................................................... 74 FIGURE 9-2: EM SIGNAL FROM ECC DOUBLE OPERATION #2 ....................................................... 75 FIGURE 9-3: EM SIGNAL FROM ECC ADDITION OPERATION #1.................................................... 75 FIGURE 9-4: EM SIGNAL FROM ECC ADDITION OPERATION #2.................................................... 76 FIGURE 9-5: PLOT OF ACCURACY VS. ENVELOPE SIZE .................................................................. 77 FIGURE 9-6: PLOT OF ACCURACY VS. MIN FRACTION ................................................................... 78 FIGURE 9-7: PLOT OF ACCURACY VS. INPUT LAYER SIZE............................................................. 79 FIGURE 9-8: PLOT OF ACCURACY VS. HIDDEN LAYER SIZE ......................................................... 80 FIGURE 9-9: PLOT OF ACCURACY VS. WINDOW SIZE ..................................................................... 82 FIGURE 9-10: PLOT OF ACCURACY VS. OVERLAP SIZE .................................................................. 83 FIGURE 9-11: PLOT OF ACCURACY VS. ENVELOPE SIZE ................................................................ 83 FIGURE 9-12: PLOT OF ACCURACY VS. MIN. FRACTION................................................................. 84 FIGURE 9-13: PLOT OF ACCURACY VS. INPUT LAYER SIZE........................................................... 85 FIGURE 9-14: PLOT OF ACCURACY VS. HIDDEN LAYER SIZE ....................................................... 85 FIGURE 9-15: PLOT OF ACCURACY VS. OBSERVATION SIZE......................................................... 88 FIGURE 9-16: PLOT OF MEMORY VS. OBSERVATION SIZE............................................................. 89 FIGURE 9-17: PLOT OF ACCURACY VS. NUMBER OF EXECUTIONS ............................................. 90 FIGURE 9-18: PLOT OF ACCURACY VS. NUMBER OF EXECUTIONS ............................................. 91
xi
List of Algorithms
ALGORITHM 4-1: DOUBLE-AND-ADD SCALAR MULTIPLICATION............................................... 20 ALGORITHM 4-2: ADD-SUBTRACT SCALAR MULTIPLICATION.................................................... 21 ALGORITHM 4-3: DOUBLE-AND-ADD-ALWAYS SCALAR MULTIPLICATION ............................ 22 ALGORITHM 5-1: MODULAR ADDITION ............................................................................................. 27 ALGORITHM 5-2: MODULAR SUBTRACTION..................................................................................... 27 ALGORITHM 5-3: STANDARD DEVIATION OF DIFFERENCE OF MEANS ..................................... 31 ALGORITHM 5-4: RATIO BETWEEN DIFFERENTIAL AND REFERENCE SIGNAL........................ 34 ALGORITHM 5-5: SPECTROGRAM ........................................................................................................ 36 ALGORITHM 9-1: OPTIMIZATION OF THE NUMBER OF NEURONS............................................... 78
1
1 Introduction
Mobile Commerce, or m-commerce, encompasses a variety of commercial services and
products that are accessible from Internet-enabled mobile devices, such as PDAs and
mobile phones. Their mobility opens the door to a slew of new applications and services.
They follow us wherever we go, making it possible to shop online while riding on a
subway train or finding a nearby restaurant while walking down the street. However
these devices impose a unique set of constrains and security challenges.
The security of m-commerce relies on the underlying public key cryptographic
functions to provide authentication, integrity, non-repudiation and encryption.
Traditional cryptanalysis techniques view a cryptography system as a black-box, and
exploit weaknesses purely at the algorithm and protocol levels. However, it is far more
powerful to exploit weaknesses in the implementation level, particularly information
inadvertently leaked to other information channels known as side channels.
Most side channel attacks require the attackers to physically access and tamper
with the mobile devices. A careful user can prevent these attacks by securing the device
from thief, or at least minimize the damage once he discovers that it is lost.
A much more devastating attack would be one that requires no physical access
and can be performed without user’s knowledge, say, when an unsuspecting user is
shopping online during a subway ride. This is possible if the attack analyzes information
leaked from electromagnetic (EM) emanations from these mobile devices. EM radiations
from mobile devices may be captured from a several feet away [AAR02]. Furthermore,
the attack would even more devastating if the attacker can perform the analysis without
knowing the plaintext and ciphertext in the cryptographic operations. This thesis
explores the feasibility of performing EM analysis on a type of public key cryptographic
system, Elliptic Curve Cryptographic (ECC) system, running on mobile devices.
2
1.1 Research Motivation
Conventional cryptanalysis techniques tend to need a huge amount of computational
resources relative to side channel attacks. Therefore side channel attacks are more
threatening than conventional cryptanalysis in practice.
In the past few years, much research attention has been afforded to the application
of side channel attacks on smart card devices. Unfortunately very little research is done
to investigate side channel attacks on mobile devices. Attacks on mobile devices have
become an important issue as these devices become more pervasive and m-commerce on
these devices becomes more prominent. PDAs are particularly suitable for m-commerce
applications, and they are chosen to be the hardware platforms for this research.
Mobile devices are very vulnerable to EM analysis as their mobility makes their
EM emanation accessible by an attacker. Further most mobile devices have limited EM
shielding and are more susceptible to thefts. However there is little research done in EM
analysis of these mobile devices.
The most crucial proponents of secured m-commerce are entity authentication, data
integrity and non-repudiation, which are provided by public key protocols such as digital
signature. However, public key protocols tend to be very computationally expensive, and
PDAs have very limited computational power and memory capacity. Therefore, the
research focus is on Elliptic Curve Cryptography (ECC) which is a very efficient public
key algorithm and is most suitable for mobile devices. ECC is also widely accepted by
research communities and industries. Hence ECC computation is the target algorithm
investigated in this research.
Hopefully, by understanding the treat of EM analysis attacks on ECC computations
of PDA devices, we can better secure these systems from adversaries.
3
1.2 Thesis Objective
The main purpose of the thesis is to present new research findings in the EM analysis of
ECC computations on PDA devices. To this end there are four objectives of this thesis.
The first objective is to investigate the techniques used to capture the EM emanations
from a PDA, which encompass locating the best source of EM emanation as well as the
most appropriate equipments and configurations used.
Once the EM emanations are captured, they are processed and analyzed to extract
useful information. The second objective is to investigate and compare the different
analysis techniques such as time domain analysis, power spectral density (PSD) and
spectrogram.
The third objective is to present new and more effective methodologies of
differential EM analysis (DEMA) on ECC computations that are particularly suitable for
PDA devices. This includes discussion on how to partition the EM signals for differential
analysis, algorithms for DEMA, and general strategies for different ECC computation
algorithms.
The final objective is to investigate the novel application of artificial intelligence
programming paradigms on SEMA. The thesis details the signal preprocessing, training
and testing required for the AI technique. It also compares its performance with the
original template attack in SEMA.
1.3 Thesis Overview
This thesis is composed of 10 chapters, and remaining chapters can be roughly divided
into 4 main parts. Chapter two to four provides the technical background materials
required for understanding the concepts in this thesis. Chapter two provides information
on the origins of EM signals and common capture techniques. Chapter three gives a brief
background on different types of side channel attacks applied on power and EM
emanation. Chapter four introduces the mathematical background and implementations
of ECC algorithms on PDA.
4
Chapter five and six describe my research contributions. Chapter five presents
my new methodologies and strategies for performing DEMA. Chapter six presents my
invention that incorporates AI programming paradigm into a template attack model for
SEMA.
Chapter seven to nine present experimental setup and results. Chapter seven
discusses experimental setup and methodology, which apply to both DEMA and SEMA
experiments. Chapter eight describes the experimental setup and results for DEMA.
Chapter nine describes the experimental setup and result for SEMA. This includes the
experimental steps for finding the parameters in the AI attack system.
Chapter ten is the conclusion chapter, which discusses the limitations,
countermeasures, summary and future work for this research.
5
2 Introduction to EM Signal Capture and Analysis
In the early 50‘s, the U.S government became concerned that an enemy can reconstruct
sensitive information from EM signals radiated from cryptographic equipment [Murray].
Some crypted teletype units were found to radiate small traces of clear text signals,
beneath the normal crypted output. Sophisticated equipment can be used to isolate and
amplify the clear text signals.
Gandolfi et al were the first to provide concrete results on EM attacks on modern
cryptographic devices. [GMO01] Soon after Quisquater et al extrapolated attacks
strategies for power signals (SPA and DPA) to EM signals (SEMA and DEMA) [QS01].
The following sections describe the types of EM signals, capture techniques, their
application on PDA devices, and spectrogram analysis of these signals.
2.1 Origin and Types of EM Signals
EM emanation is caused by current flow within the control, I/O, data processing, or other
parts of a device. Any electrical current flowing through a conductor induces
electromagnetic emanations. For instance, during switching of a CMOS gate shown in
Figure 2-1, a short current pulse travels from power to ground line, thereby emitting an
EM signal whenever the logic state flips.
Figure 2-1: CMOS gate
6
This partially explains the correlation between the EM signal and the transition’s
Hamming distance. However, not only do current carrying components produce their
own emanation, but they also affect emanations from other components due to coupling
and circuit geometry.
Different areas of the chip radiate with different intensities and varying code
dependencies. However there is an important distinction between high energy signals
and high information signals. A data bus may be a good source of high information
signals as they may be correlated with the bus data. On the other hand, power lines may
produce high energy signals that have no data correlation. Experimentally the most
active points appear to be located near the CPU and data buses. Figure 2-2 is a 3-D
diagram showing the region of highest EM emanation on a processor [QS01]. One
should take precautions to make sure the high information signals are not overwhelmed
by the high energy but low information signals.
Figure 2-2: 3D diagram of EM emanation [QS01]
Although the spectral power of EM signals decreases with increasing clock
frequency of the computation device, the radiation effectiveness varies directly with
frequency [QS01]. Hence modern computing devices running at high clock frequency
are more vulnerable to EM analysis.
7
One important difference of EM emanation from other side channels such as
power signals is that the output of even a single EM sensor consists of multiple
compromising signals of different types, strengths, and information content. Each active
component of the device produces and induces different emanations, which provide
different views of the events occurring within the devices. These views can be obtained
by using different types and positions of sensors, or by simply focusing on different
emanations captured by a single sensor. This is very different from power analysis where
there is only a single view of net current flow.
2.2 Capture of EM Signals
EM signals propagate via either radiation, conduction, or a complex combination of both
methods from the device. Conductive emanations consist of tiny currents found on all
conductive surfaces or lines attached to the device, possibly riding on top of stronger,
intentional currents within the same conductors, such as a power line. Current probes and
oscilloscope can be used to measure these tiny current. In fact the capture and processing
equipment for conductive emanation is very similar to those for power analysis.
The EM radiation signals may be captured by various kinds of EM probes. An
example of these probes consists of a small highly conducting metal, such as copper or
silver plate, attached to a coaxial cable. Another type consists of solenoid made of coiled
copper wire of outer diameter varying between 150 and 500 microns, shown in Figure 2-
3. Since EM analysis without direct physical access is far more dangerous, the radiation
signal is the focus of this research.
The quality of the received signal improves if the equipment is shielded from
interfering EM emanations in the band of interest. Ideally EM analysis should be
performed inside a Faraday cage that shields the equipment from ambient EM
emanations. However, this is difficult to accomplish and it is far more productive to
ensure there is no strong source of interfering EM signals [GMO01]. There are two
general types of EM emanations that may correlate with secret data: direct emanation and
unintended emanation [AAR02].
8
Figure 2-3. Near field probe [GMO01]
2.3 Direct Emanation
Direct emanation is EM radiation induced by current flows within a computing device.
Since activities within a device are synchronized with the system clock, current flows
tend to occur in short bursts with sharp rising edges resulting in emanations observable
over a wide frequency band [AAR02]. Often components in higher frequencies can be
easier to detect as noise and interference are prevalent in the lower frequency bands.
In general direct emanation signals are weak and difficult to detect. In complex
circuits, isolating direct emanations may require use of tiny field probes positioned very
close to the signal source or even direct attachment of the probes to the signal sources
[GM001]. In some cases this requires decapsulating the device package. The use of
filters and amplifiers is necessary to improve the captured signal.
2.4 Unintended Emanation
Unintended emanations are caused by electrical and electromagnetic coupling between
devices in close proximity in modern digital devices, due to the increased miniaturization
and complexity of the CMOS devices [AAR02]. Although these couplings are mostly
harmless to the device function, they provide a rich source of information signals. A
9
weak and otherwise undetectable information signal can modulate a strong carrier signal,
thereby allowing the information to be recovered.
There are two classes of unintended emanations. The first class is amplitude-
modulated (AM) signal, which is caused by non-linear coupling between a carrier signal
and a data signal [AAR02]. The data signal can be demodulated and is thereby extracted
using a receiver tuned to the carrier frequency. A strong source of carrier signal is the
ubiquitous harmonic-rich square-wave clock signal. An advantage of using a harmonic-
rich carrier such as a clock is that the attacker can choose higher harmonics of the clock
frequency which have higher radiation effectiveness and is in a frequency band with less
noise and interference. The other class of unintended emanations is phase-modulated
(PM) signal, which results from coupling between data signals and communication
circuits. The data signal may then be recovered by phase demodulations of the generated
signal [AAR02].
In general extracting information from unintended signal is easier because
modulated carrier can have substantially greater propagation than direct emanation.
Modulated carriers can be detected several feet away from the device [AAR02]. In
contrast, direct emanation must be detected by sensors within a couple of millimeters
from the source. Hence EM analysis can be launched in a distance and without resorting
to invasive techniques.
The probe should be connected to receiver that demodulates the signal, which is
subsequently connected to a digital scope. If the unintentional emanation is an AM signal
riding on a harmonic-rich carrier, it may be advantageous to capture a signal modulated
by a harmonic of the carrier, as high frequency signals have better propagation. Lower
harmonics suffer from noise and interference, while higher harmonics have lower signal
strength due to non-ideal clock waveforms [AAR02]. The receiver/demodulator should
be tuned to the carrier harmonic with the best tradeoff between signal frequency and
strength.
10
2.5 Benefits of EM Analysis of PDA
Common side channel techniques use power signals. However EM signals are more
suitable for attacking PDA. First of all, a PDA has a more powerful processor that
operates at higher clock frequency that produces stronger EM radiation. Secondly, it is
inconvenient for an attacker to obtain power signals of PDA as they operate on an
internal battery as supposed to an external power source. Obtaining a power signal from
a PDA would require physical access to the device. On the other hand, obtaining EM
signals from PDA is relatively easy because PDAs are mobile and their signals can be
captured by adversaries while these devices are in use. Finally even in the event that an
attacker has obtained a PDA device, it is easier to use EM emanation as a side channel.
Measuring the power drained from a PDA battery is like finding a needle in a haystack,
because PDA has many other components like DSP processor, non-volatile memory unit,
radio receiver and LCD screen. The attacker really needs to find a component that
produces compromising power signals. With EM emanation, it is very convenient to
determine which component produces the strongest information-rich signal. For these
reasons, EM analysis is most suitable for PDA devices.
2.6 Spectrogram
Typically, SEMA and DEMA are performed with time domain and frequency domain
signals. However it is also possible to perform an analysis on spectrogram signals.
Spectrogram is a type of time-dependant frequency analysis. It consists of frequency
domain signals of successive sequence of time windows, as shown in figure 2-4. The
vertical axis represents frequencies up to about 6 MHz, and the horizontal axis shows
positive time toward the right up to 2 ms. The color indicates the frequency signal
amplitude.
11
Figure 2-4. Spectrogram
The optimal window size varies depending on application, and may be found
experimentally. There is a trade-off between frequency resolution and time resolution,
whereas the window size is directly proportional to frequency resolution but indirectly
proportional to time resolution. Higher frequency resolution provides more details of the
frequency content; on the other hand, higher time resolution shows more precisely how
the frequency content changes over time. The frequency range is related only to the
sampling rate of the signal acquisition.
The spectrogram windows should overlap so that signals near the edges of the
windows would not be lost in frequency analysis. A higher degree of overlap reduces
lost in frequency information, but incurs heavier computations.
Spectrogram has many important applications in engineering. In speech analysis,
for example, the phonemes of a spoken word have unique and distinguishable signals in
the frequency domain. A speech recognition system uses spectrogram to detect the
sequence of phonemes of a word. It was found that spectrogram is also useful in
differential analysis attacks on either power or EM signals.
12
3 Introduction to Side Channel Attacks
A side channel is an unintended channel that contains leakage of sensitive information
from a cryptography computation. In theory a side channel exists in every cryptography
system. The underlying implementations of all cryptographic algorithms are physical
processes where data elements are represented by physical quantities (e.g. electric charge)
stored in a physical structure (e.g. transistor) [GMO01]. These physical quantities require
a minimum time to be sensed, transmitted, and stored. As well, all computations involve
state changes at the underlying physical structures which, from the laws of
thermodynamics, must inflict an invertible change from one form of energy to another.
Therefore all computations emit a certain amount of energy at distinguishable time
intervals, which form the basic proponents of a side channel.
Previous research in side channel attacks has focused on applications such as pay-
TV smart cards, prepayment meter tokens, and smart card. The five types of side channel
attacks discussed here are timing analysis [K96], fault analysis [BDL], simple analysis
[O02], differential analysis [KJJ99] and template attacks [CRR02]. For the later three
forms of attacks, they are commonly applied on power and EM signals, which correlate
with bits of internal storage during encryption. At each clock cycle, the activities of the
transistors produce a unique signature in these side channels that may be exploited to
recover the secret keying materials. Power signals may be extracted from the device’s
battery or an external power source. The EM signal may be captured with an appropriate
EM probe.
3.1 Timing Analysis
The earliest and most primitive form of side channel uses timing characteristics of the
implementations of the cryptographic algorithm to break it. For example, length of time
to compute scalar multiplication based on binary expansion correlates with the Hamming
weight of the scalar, since point addition is only performed for a scalar bit equals to one.
13
Paul Kocher et al. showed how measurement of the amount of time required to
perform private key operations is used to find the key exponents and thereby breaking the
cryptographic system [K96]. An obvious countermeasure is to remove any correlation
between the secret and operating timing by adding redundant operations.
In general, timing analysis is not a practical cryptanalysis technique. First of all,
it is difficult to precisely measure the duration of a cryptography operation. Secondly,
timing analysis only provides limited amount of information about the secret in a
cryptographic system, such as the Hamming weight of a scalar. Therefore, timing
analysis is not considered a serious threat to security.
3.2 Fault Analysis
Fault analysis is initially proposed by Boneh et al. [BDL] which applies to algebraic
structure used in public key cryptography. In an implementation of RSA based on the
Chinese remainder theorem, Boneh et al show that given one faulty version of RSA
signature, one can effectively factor the RSA modulus with high probability.
Biham and Shamir proposed a related attack known as Differential Fault Analysis
(DFA) which applies to all common secret key cryptosystems [BS96]. In DFA, one set
of data is encrypted from a working device, while another set is encrypted with a device
with induced random faults. Comparison of these sets of data can yield information
about the secret key. DFA can find the last DES round key using less than 200 cipher
texts. Furthermore, DFA can break triple-DES with similar number of ciphertexts.
The discovery of fault analysis shows the importance of verifying the correctness
of computational results for security reasons. For instance, a device that generates an
ECC signature should verify its correctness before it is issued. On detecting a faulty
computation the ciphertext needs to be suppressed to protect the secret key.
14
3.3 Simple Analysis on Power/EM Signals
Simple Power Analysis (SPA) utilizes power consumption data from a single encryption
to extract secret information [O02]. It is possible to use power traces from multiple runs
of the encryption, however, it must encrypt with the same secret and plaintext. For
simplicity this section describes simple analysis with respect to power signals only,
however these techniques are directly applicable to EM signals, and their counterparts are
termed as Simple EM Analysis (SEMA). There are no known simple analysis results on
ECC computations on PDA devices.
Very often, keying information is extracted from a single sample due to leakage
from the execution of key dependent code and/or the use of instructions which leak
substantial information in the side channel over the noise. The adversary is assumed to
have some fairly explicit knowledge of the analyzed cryptosystem. In particular, he
knows the time at which the power consumption is correlated with part of the secret.
Different operations in encryption/decryption produce distinguishable power
signatures. For example, the ECC point addition and double operations should produce
distinguishable power signatures. An adversary examining the power consumption can
determine the sequence of operations in the encryption/decryption, thereby deciphering
the secret key of the cryptosystem.
However, SPA is not always practical because it requires that the adversary has
detailed knowledge of the encryption algorithm. Furthermore, it is difficult to distinguish
the different operations from the power signals. This thesis describe a innovative
approach of accurately recognizing side channel signals of different operations.
3.4 Differential Analysis on Power/EM Signals
Differential Power Analysis (DPA) is proposed by Paul Kocher in 1998. [KJJ99] The
methodologies for DPA can be applied to EM signals, and they are termed as Differential
EM Analysis (DEMA). There are no known differential analysis results on PDA devices.
15
DPA relies on a statistical analysis of a large number of samples where the same
keying material is used to operate on different data. An adversary captures the power
consumptions of many runs of the encryption operations using the same secret key but
different plaintexts. The adversary discovers the secret key by guessing each key bit and
using statistical techniques to verify this guess.
The adversary begins by identifying a partition bit of a good intermediate variable
within the encryption algorithm. A good intermediate variable is one that contributes
significantly to the power consumption, perhaps one that is accessed repeatedly in the
algorithm. As well, it is dependant only on the plaintext and one key bit. For instance,
the input point in an ECC point operation may a good candidate. An adversary makes an
arbitrary guess on a key bit value, and calculates the value of the partition bit based on
this guess.
The adversary divides the power consumption traces into two sets; one set for the
partition bit value of 1 and another set for the value of 0. The adversary then finds the
differential signal of the average power consumption of the two sets of traces.
The partition bit value is correct at each iteration when the guess on key bit is
correct. Consequently the differential signal should show significant spikes, where the
spikes correspond to the times when the partition bit corresponds with power, perhaps
when the bit is being accessed. On the other hand, if the guess is incorrect, both sets
would have very similar average power consumption and no noticeable spikes are
observed in the differential signal. Either way, the adversary would discover the true key
bit value.
The adversary can now move on to the next key bit; he would guess the key bit,
partition the traces based on this guess, and find the differential signal, thereby recovering
the key bit value. This process will continue until the entire key is discovered.
DPA is more superior to SPA in that the adversary does not need such specific
information about how the analyzed device implements its function. In particular, the
adversary can be ignorant of the specific times at which the power consumption is
correlated with the secret, though it is only necessary that the correlation is reasonable
consistent.
16
However, DPA requires knowledge of the plaintext and can only be applied for
chosen-plaintext attacks. This is not true for SPA.
3.5 Template Attack on Power/EM Signals
The template attack is derived from Signal Detection and Estimation Theory, and it is the
strongest form of side channel attack possible in an information theoretic sense [CRR02].
There are no known results of template attacks on ECC computations of any
implementations.
While other common techniques view noise as distortion that needs to be reduced
or eliminated, the template analysis views noise as a source of information and focuses on
modeling noise to fully extract information present in a single sample. A template is a
model that characterizes original signal and noise for an operation, based on the
assumption that the captured signal is a linear combination of these components. It
requires the adversary to access an identical experimental device which he can program
to his choosing and obtain as many side channel samples as needed. This is a reasonable
requirement if the target device is a widely commercially available product.
For a device that can perform one of the K operations {O1…Ok}, an adversary can
use template attack to identify the operation performed given only one sample. The
different operations could be different point operations in ECC, or it could be the same
operation applied on different input data. The template is derived from L samples
(typical one thousand) on the experiment device for each of the K operations. The signal
component of the template is the average signal M. Typically, only a subset of points (N
points) with large deviation is selected to build the template. The noise vector for each
sample is the difference between the sample signal and average signal. Mathematically,
the noise vector Ni(T) for sample T of operation Oi is computed as follows.
])[][],...,1[]1[()( NMNTMTTN iii −−=
17
The noise is assumed to be Gaussian, and the noise component of the template is
characterized by the noise covariance matrix from L samples of operation Oi.
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡=∑
])[],[cov(...])[],1[cov(......
])1[],[cov(...])1[],1[cov(
NNNNNNN
NNNNN
iiii
iiii
Ni
Once a template is built for each of the K operations, it is possible to calculate the
probability of a sample originating from each of the possible operations using the
equation below. The most likely operation for an observed sample can then be found.
⎟⎠⎞
⎜⎝⎛−= ∑
∑−1
21exp
)2(
1)(i
i
i NT
NN
N nnnpπ
18
4 Introduction to Elliptic Curve Cryptography
Public key cryptosystems are commonly used in digital signature and other authentication
applications, where the verifier can check the authenticity of the signer without
knowledge of the signer’s secret key. Common public key cryptosystems include ECC,
RSA and ElGamal. Although public key cryptography is vital for many security
applications, they tend to be costly computationally. The efficiency of Elliptic Curve
Cryptography (ECC) makes it suitable for resource-constrained devices such as
smartcards and PDAs.
The following sections provide descriptions of ECC mathematics, benefits of their
uses on PDA, common implementations, and common countermeasures against side
channel attacks.
4.1 Mathematical Overview
ECC is widely considered to be a more efficient public key system. Since it is first
proposed by Victor Miller [Miller] and Neal Koblitz [Koblitz] in 1985, it has received
much attention from the research community and industry.
The principles of ECC are similar to those of other public key cryptosystems in
that they operate on elements of a defined cyclic group; Protocols of other public key
systems can apply to ECC. However ECC can provide an equivalent amount of security
using smaller key sizes. As well, it allows for many different implementations tailored
for different requirements.
An elliptic curve is s set of points (x, y) which are solutions of a bivariate cubic
equation over a finite field. EC defined over a large prime field GF(p) satisfies an
equation of the form:
baxxy ++= 32 where )(, pGFba ∈ .
19
When defined over a binary field GF(2n), EC satisfies an equation of the form:
baxxxyy ++=+ 232 where )2(, nGFba ∈ .
ECC usually employs curves whose order is the product of a large prime and a
very small integer h, called cofactor. The cofactor h is often 1. The parameter a is often
chosen to be -1 for efficient implementation. The set of points on EC, together with the
special point O known as point at infinity, form the elements of the cyclic group. The
operations of this cyclic group are point addition and doubling, and the point at infinity is
the additive identity.
The scalar multiplication in ECC is analogous to the integer exponentiation in
RSA. Scalar multiplication is the dominant operation in ECC protocols and is a logical
point of attack by cryptanalysts. It is the operation of adding a point P to itself d times to
compute dP.
4.2 Benefits for PDA Implementation
The security of ECC protocols depends on how well the scalar d is hidden in scalar
multiplication, since this scalar is related to the secret private key. Common
mathematical attacks involve solving the elliptic curve discrete logarithm problem
(ECDLP), which is the problem of finding the secret scalar d given the known points P
and dP. No subexponential time algorithm is known for ECDLP in the class of non-
supersingular EC.
ECDLP in ECC is analogous to DLP in ElGamal and integer factoring problem in
RSA; however ECDLP appears to be much more difficult over a finite field of the same
size. Therefore ECC can offer equivalent security with fewer key bits. Fewer key bits
reduces computational and bandwidth requirement of the cryptosystems. The key bit size
is directly proportional to computational time of a cryptosystem. As well, a small key
requires less bandwidth to transmit. These properties make ECC ideal for resource
20
constrained devices such as PDA that have less powerful processor, limited power supply
and slow wireless network connectivity.
4.3 Implementation
There are many implementations of the scalar multiplication method. The most
straight-forward approach is double-and-add approach based on binary expansion of d, as
shown in algorithm 4-1. This implementation is used for all the experiments described in
this thesis.
An improvement one can apply to this algorithm is to convert the scalar into the
non-adjacent form (NAF). NAF of scalar d is a signed binary expansion of d with no two
adjacent bits being non-zero. The NAF form has the fewest nonzero coefficients of any
binary expansion and there exists an simple algorithm for converting to NAF form. The
modified scalar multiplication algorithm using NAF form is the addition-subtraction
method, as shown in algorithm 4-2.
Another way to speed up computation of scalar multiplication is to use the
window’s method which takes advantages of precomputed lookup tables. Essentially, the
scalar bits are divided into appropriated sized windows, and point multiplication by each
window of bits is found from a lookup table.
Algorithm 4-1: Double-and-Add Scalar Multiplication
Input: P and d=(dh-1,…,d0)2
Output: Q = dP Q := 0 for (j := h - 1; j >= 0; j--) { Q := dbl(Q) if (dj = 1) Q := add(Q, P) } return Q
21
Algorithm 4-2: Add-Subtract Scalar Multiplication
ECC and public key systems are mostly used for authentication and key exchange
application. The ECC protocol for digital signature is elliptic curve digital signature
algorithm (ECDSA) described in ANSI X9.62. Two ECC protocols for key exchange are
elliptic curve diffie-hellman (ECDH) key exchange, described in ANSI X9.62, and a
scheme developed by Menezes, Qu, and Vanstone (ECMQV).
Public key systems are very slow at encryption compared to their symmetric key
counterparts. However they can be used to encrypt a symmetric key between
communication parties. A protocol for encryption is elliptic curve integrated encryption
scheme (ECIES) described in ANSI X9.63.
4.4 Countermeasures to Thwart Side Channel Attacks
There are three basic approaches to resist simple analysis: indistinguishable formulas for
point operations, identical operation sequence regardless of key bits, and random addition
chain.
There are two classes of elliptic curves for the first approach. The Jacobi form
and Hesse form [LS01] [JQ01] elliptic curves achieve this as they use the same formulas
for point doubling and addition. However, this requires specifically chosen curves and
not generally applicable. Brier and Joye proposed an indistinguishable addition and
Input: P and d=(dh-1,…,d0)2
Output: Q = dP Q := 0 for (j := h - 1; j >= 0; j--) { Q := dbl(Q) if (dj = +1) Q := add(Q, P) if (dj = -1) Q := sub(Q, P) } return Q
22
doubling algorithm applicable to Weierstrass form curves [BJ02], but it fails on certain
inputs, making it vulnerable to attacks [IT02b].
The second approach is applied in two scalar multiplication algorithms: double-
and-always-add, as shown in algorithm 4-3 [Cor99], and Montgomery ladder [OS00,
Mo87] algorithms. The Montgomery ladder algorithm was later extended to general
curves [BJ02, IT02a].
Algorithm 4-3: Double-and-Add-Always Scalar Multiplication
These algorithms are resistant to simple analysis in that the same sequence of point
operations are performed regardless of the scalar value, hence an attacker cannot decipher
the scalar bits even if the point operations are distinguishable in the side channel.
The third approach is to use a special addition chain with a sequence of additions
and doublings that can mutate randomly. One algorithm using this approach is
randomized addition-subtraction chain [OA01]. Instead of using only addition and
double operations as in the standard scalar multiplication algorithms, this algorithm can
also use subtraction operation to perform scalar multiplication. The advantage is that
there may be plural addition-subtraction chains for a given scalar; this follows directly
from the fact signed digit representations are redundant. For instance, there is a unique
binary representation (1111)2 for the number 15. However, the number 15 can have
multiple signed digit representations such as (1000Ī)SD and (10Ī11)SD. This allows the
algorithm to choose any permutation of the three point operations to perform scalar
multiplication, making simple and differential analysis difficult.
Input: P and k=(kh-1,…,k0)2
Output: Q = kP Q[0] := 0; for (j := h - 1; j >= 0; j--) { Q[0] := ECCDBL(Q[0]) Q[1] := ECCADD(Q[0], P) Q[0] := Q[kj] } return Q[0]
23
Scalar multiplication algorithms that are secured against simple analysis may be
vulnerable to differential analysis. Fortunately, it is easy to enhance an algorithm to
resist differential analysis as well. There are two general approaches to resisting
differential analysis: randomizing the base point P and randomizing the scalar k.
One application of the first approach is point blinding [Cor99], which blinds the
side channel information by adding a random point R to the input and subtracting a
random point S to the output of a scalar multiplication algorithm (resistant to simple
analysis). The point S equals to the point kR, hence the final result is correctly computed.
The points R and S are updated before each multiplication to reduce leakage of
these values in the side channel. They can be conveniently updated as follows:
R := (-1)b2R S := (-1)b2S where b is a random integer
The second application of base-point randomization is projective randomization
[Cor99]. There are many varieties of projective coordinates [BHL00]. A point
represented in Jacobian coordinates P = (X:Y:Z), for example, can be equivalently
expressed as coordinate (r2X:r3Y:rZ) where r belongs to the set of finite field elements
excluding additive identity. The countermeasure transforms a base point (X:Y:Z) into
(r2X:r3Y:rZ) with a random r before starting the scalar multiplication which effectively
randomize any side channel information.
The last application of base-point randomization is proposed by Joye and Tymen
[JT01], which is based on randomly selected isomorphisms between elliptic curves. The
base point P = (x,y) and the parameters a, b of an elliptic curve can be randomized into P’
= (r2x,r3y) and a’ = r4a, b’ = r6b, which would randomize any side channel information.
One application of scalar randomization is scalar blinding [Cor99]. This involves
randomizing the scalar k into (k + r*#E(K)), where #E(K) is the order of the elliptic curve
K and r is a random number.
Another application of this approach is randomized multiplier recording [JT01],
which applies to Koblitz curves of GF(2m). The algorithm involves randomly choosing a
one of multiple NAF expansions of the scalar k.
24
There are some countermeasures against differential analysis that are designed for
window method of scalar multiplication. The first method is Overlapping Window
Method [IYTT02], which counters differential analysis by having overlapping adjacent
windows and thus allowing the plural possible values of windows for a given scalar k.
The second method is Randomized Table Window Method [IYTT02], which counters
differential analysis by randomizing the pre-computed table and normalizing the
randomized data to obtain the correct final result.
25
5 Proposed Methodology of DEMA
This chapter presents the proposed methodology of DEMA against ECC computations on
PDA implementation. There are no known DEMA results for ECC computations on
PDA devices.
The concept of differential EM Analysis (DEMA) is first proposed by Quisquater
[QS01] et al, and is modeled after DPA. It involves statistical analysis of multiple traces
of EM emanations.
As discussed in section 2.5, EM side channel is particularly devastating for PDA
device due to its mobility and device characteristics. The point of attack in ECC
algorithms is ECC scalar multiplication as it is the dominant operation in ECC
cryptographic operations. In most cases, the base point in scalar multiplication is a fixed
parameter in a cryptographic system, and the scalar is derived from the secret private key
and some random secret. Being able to recover the scalar would compromise the
confidentiality of the secret private key and hence the scalar needs to be protected. The
difficulty of ECDLP makes it hard to find the secret scalar even with the knowledge of
the input and output points. The goal of DEMA is to recover this secret scalar from EM
side channel with significantly less computational resources and time.
In DEMA the attacker split the EM traces into two sets, dependant on the guess of
one or a small group of scalar bits. If the guess on the scalar bits is correct, the attacker
should detect significant differences in average EM signals of the two sets.
An optimal way to perform trace splitting is described in section 6.1. After the
traces are split, statistical techniques are applied to analyze the two sets to determine if
their difference is significant. This is used to verify the guess and consequently recover
the secret scalar. This process is described in 6.2. It was found that frequency domain
and spectrogram may be useful in DEMA. The analogous processes with frequency
domain and spectrogram are described in 6.3.
The discussion of DEMA focuses on algorithms that are resistant to simple
analysis, as SEMA is more suitable for attacking those that are vulnerable to simple
analysis. There are different strategies of DEMA against different scalar multiplication
26
algorithms. The simplest case is to perform DEMA on scalar multiplication when the
attacker can clearly distinguish between double and addition operations, which is
described in section 6.4. However, some scalar multiplication algorithms use
indistinguishable formulas for point addition and double. Performing DEMA on such
scalar multiplication algorithm is more difficult and is described in section 6.5. Finally,
DEMA on window method, which is the most common and efficient way of performing
scalar multiplication, is discussed. This is described in section 6.6.
5.1 Proposed Trace Splitting Strategy
The attacker must pick a partial value for trace splitting that depends on the input point
and varies with each part of the scalar. It is better from the attacker’s point of view that
this change is affected by a small part of the scalar at each iteration of the algorithm.
For example, in the standard scalar multiplication, one should pick a partial value
that changes as each scalar bit is processed. In the window method, a desirable partial
value would change after each window of the scalar is processed. Only one bit value is
needed to partition the traces into two bins. Optimally, the attacker should pick a partial
variable and a bit of this variable that have the biggest impact on the EM emanation.
Some literatures claim that the Hamming weight of the partial value affects the
magnitude of the emanation. However, there is no evidence to indicate that a function
that operates with an operand with higher Hamming weight would produces higher or
lower levels of EM emanation. A register built in CMOS technology that has more bits
set to 1 would not dissipate more energy, since there are no static current in CMOS
transistors. On the other hand, the Hamming distance of a register value change may
correlate with energy dissipation and EM emanation.
In fact, each bit has differing impact on the EM side channel. The best partition
bit appears to be the most significant bit (MSB) of an input point coordinate (x or y
coordinate) for point operation. It is simple to see that an input point has a great impact
on the computations within a point operation. The coordinate of an input is used in
numerous prime field computations over GF(p) within a point operation such as squaring,
multiplication, and addition. The reason for choosing MSB as the partition bit is that
27
when the MSB is 1, there is a much higher probability that subsequent finite field
computations with the coordinate would result a carry-out, which would trigger an
avalanche of different computations producing distinguishable side channel signals.
Typical extra computations are performed to handle carry-out in a finite field
operation. For example, when carry-out occurs during a modular addition shown in
algorithm 5-1, the result must then be subtracted by the field defining polynomial p. The
extra field subtraction operation would produce distinguishable signal in the EM side
channel.
Algorithm 5-1: Modular Addition
).(Return 5. then If 4.
. from psubtract then set, isbit carry theIf 3.).arry(Add_with_c :do 1- t to1 from iFor 2.
).Add(c 1..mod :Output
1].-[0,integers and , modulusA :Input
0121
00
00o
cc -p.c p c
),c,c,...,c (cc ,bac
,ba p b) (a c
pa,bp
t-
i
←≥=
←←
+=∈
Similarly, when borrowing occurs during a modular subtraction in algorithm 5-2, the
result must be added by the polynomial p. Again, the extra addition operation would
produce distinguishable EM signals.
Algorithm 5-2: Modular Subtraction
).(Return 4.. from p add then set, isbit carry theIf 3.
).orrow(Sub_with_b :do 1- t to1 from iFor 2.).Subtract(c 1.
.mod :Output1].-[0,integers and , modulusA :Input
0121
00
00o
c),c,c,...,c (cc
,bac,ba
p b) (a c pa,bp
t-
i
=←
←−=
∈
The coordinates of the ECC points are finite field elements. In the DblJJ point operation,
the x-coordinate of the input is directly fed as input to three finite field operations: scalar
28
multiplication by 4, modular addition, and modular subtraction. Scalar multiplication is
implemented by bitwise shift to the left. It is simple to see that in scalar multiplication by
4, carry-out occurs when one of the two most significant bits of the field element is one.
The carry-out can be observed in the EM side channel.
The MSB of the x-coordinate also changes the probability of carry-out in the
addition operation. When MSB is 1, the probability of carry-out in addition operation
between the x-coordinate and a random finite field element is approximately ¾.
However, when MSB is 0, the probability becomes approximately ¼.
To show that this is true, in the addition of the x-coordinate and a random element
over GF(p), there are n different values of the random element that can cause carry-out
when the x-coordinate is equal to n. In the case when MSB is set to 0, since the range of
the x-coordinate is from 2191-1 to 0, the number of ways a carry-out can occur is the sum
of the arithmetic series from 2191-1 to 0. In the case when MSB is set to 1, since the
range of the x-coordinate is from 2192-1 to 2191, the number of ways a carry-out can occur
is the sum of the arithmetic series from 2192-1 to 2191. The total number of different
combinations of x-coordinates (with MSB sets to a value) and random element is p2/2.
For simplicity, the calculations below assumes p is approximately equaled to 2192. The
calculations are shown as follows.
41)(
2)2)(2(
21)(
2)2)(12(
21)(
21)(
0MSBwhen
191191
383
191191
383
12
0383
191
≈
×≈
−×≈
×≈
=
∑−
=
overflowP
overflowP
overflowP
noverflowPn
29
43)(
2)12)(12(2
21)(
2)22)(22(
21)(
2)1212)(212(
21)(
21)(
1MSBwhen
382
383
191192191192
383
191192191192
383
12
2383
192
191
≈
−+××≈
−+×≈
+−−+−×≈
×≈
=
∑−
=
overflowP
overflowP
overflowP
overflowP
noverflowPn
Using a similar analysis, one can show that MSB of the x coordinate changes the
probability of underflow in subtraction operation. When MSB is 1, the probability of
underflow in addition operation between the x-coordinate and a random finite field
element is ¼. However, when MSB is 0, the probability becomes ¾.
In summary, since the MSB value can both exactly and probabilistically cause
modulo reduction in finite field operations, it correlates with the signals in the EM side
channel.
With MSB partitioning, a correct guess of the key bits would generate
distinguishable spikes in differential signals. With MSB partition and assuming the
guesses on key bits are correct, the attacker obtained a bin of traces from a point
operation that has high frequency of carry-out, and another bin for a point operation
having low frequency of carry-out, as shown in figure 5-1.
Figure 5-1: MSB Partitioning
30
5.2 Proposed Differential Analysis of Traces
Once the traces are partitioned, statistical techniques are applied to the sets to verify the
scalar bit guess. For simplicity, the case where the attacker is guessing one key bit at a
time will be considered. The more complicated case where the attacker must guess a
window of bits at a time is described in section 5.6.
Essentially, if the bit guess is correct, there should be significant difference
between the averages of the two sets of signals. However, one needs a systematic way to
decide if the differential signal is significant by comparison to a reference signal. The
simplest reference is a constant signal as in figure 5-2, where the significance of a peak is
its multiple of this reference. However this reference does not consider the degree of
variability in EM signals due to noise at different time points.
Constant Reference
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1 2 3 4 5 6 7 8 9 10
Time
Ampl
itude
RefDiff Signal
Figure 5-2: Constant Reference Signal
A better reference signal would be the standard deviation of difference of means
(SD-DOM), as shown in algorithm 5-3. SD-DOM is a measurement of the variability of
EM traces. The SD-DOM peaks at time points with high variability.
31
Algorithm 5-3: Standard Deviation of Difference of Means
Ru
D
u
DR
VuVu(VD(VD
)V(V
RV b
return :6
:5
)SIZE(:4)SIZE(:3)STD:2)STD:1
,SD_DOM
set signal of size SIZE()functiondeviation standard STD()
means of difference ofdeviation standard bset signal
1
21
0
20
11
00
11
00
10
+←
←
←
←
←
==
==
In figure 5-3, the SD-DOM signal indicates that EM traces are noisy from time
point 2 to 3, while the EM traces are quiet from time point 5 to 6. Even though there are
two identical peaks in the differential signal, the second one is considered to be more
significant because it occurs at time points when the EM traces are quiet. If the constant
reference signal was used, as in figure 1, both peaks would be considered as equally
significant.
32
SD-DOM Reference
0
0.5
1
1.5
2
2.5
1 2 3 4 5 6 7 8 9 10
Time
Ampl
itude Ref
Diff Signal
Figure 5-3: SD-DOM Reference Signal
Still, even armed with this definition, there are multiple ways to decide if the
overall differential signal is significant. One way is to simply look at the most significant
peak and calculates the ratio of this amplitude to the reference. This is great if the
differential signals for correct and incorrect bit guess are easily distinguishable, and the
most significant peaks from the two signals are far apart, as shown in figure 5-4.
However for noisy signals from PDA devices, the most significant peaks resulting from
correct and incorrect guesses are sometimes close together, as in figure 5-5.
33
Ideal Differential Signal
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
1 2 3 4 5 6 7 8 9 10
Time
Ampl
itude Ref
Correct GuessWrong Guess
Figure 5-4: Ideal Differential Signal
Actual Differential Signal
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1 2 3 4 5 6 7 8 9 10
Time
Ampl
itude Ref
Correct GuessWrong Guess
Figure 5-5: Actual Differential Signal
Perhaps the better approach is to consider a top percentile of peaks. A reasonable
criterion for a significant differential signal would require, for example, that the most
significant one percentile of peaks be greater than κ * SD_DOM. The most suitable
34
value for the coefficient κ should be found experimentally. Algorithm 5-4 defines how to
calculate the ratio between the differential signal and reference signal.
Algorithm 5-4: Ratio between Differential and Reference Signal
StRtDabstS
mttTTD
)T(TRmTT
SmT b
return :5)(/))(()( :4:}1,,0{ ,each for :3
)(Mean)(Mean:2,SD_DOM:1
),,DEMA(
set signal ofmean MEAN()reference and diffbetween ratio
traceEMfor points timeofnumber bset traceEM
01
10
10
←−∈
−←
←
====
K
5.3 Proposed Differential Analysis in Frequency and Spectrogram
Previous research on differential analysis focus on the analysis of time domain
signals. However, it is also applicable to frequency domain signals. In theory, peaks of
differential signals in time domain should also appear in frequency domain, as any
changes in time domain signals would induce changes in the frequency domain signals.
Frequency domain signal is particularly useful for second-order differential analysis,
where correlation exists between EM emanations of two time points [WW2]. As well,
frequency analysis may reveal loops and other repeating structures in an algorithm that is
not possible with time domain analysis. More importantly, frequency signals are less
sensitive to random jitters and delays of time signals due to slight variations in instruction
execution times that are very common in more advanced mobile devices.
However there are two problems with using frequency domain signals in
differential analysis. First, it reveals no information of when data-dependant operations
occur. This timing information is very useful as it helps an adversary focus the signal
35
analysis on these data-dependant operations. Secondly, any peaks in frequency domain
due to an event that occurs in a short duration may be discernable if the acquisition
duration is a lot longer. The solution of these problems is to use spectrogram, which is a
time dependant frequency analysis.
Time domain analysis and frequency domain analysis are essentially two special
cases of spectrogram, as shown in figure 5-6. Frequency domain analysis is equivalent to
spectrogram with one time window. Time domain analysis is equivalent to spectrogram
with the same number of time windows as sample points. The window size is
appropriately chosen to find the optimal balance between the advantages and
disadvantages of the time domain and frequency domain analysis.
Figure 5-6: Relationship between Different Signals
The steps in creating a spectrogram are shown in algorithm 5-5. The first
component is taking the Fast Fourier Transform (FFT), which results in a frequency
domain signal. From Nyquist criterion, the size of this frequency signal is half of the size
of the time window. The second component is taking a dot product between the
frequency signal and a Hamming window. The application of Hamming function
suppresses the Gibbs’ phenomena in spectral windowing [OSB99]. The remainder of the
DEMA algorithm proceeds in a similar way as one for time-domain analysis.
It is sometimes advantageous to have overlap between adjacent time windows.
Overlap reduces lost of frequency information of signals close to the edges of windows.
The optimal value of overlap should be found experimentally. However a good value of
overlap may be half of the window size.
36
Algorithm 5-5: Spectrogram
V
FwswsV
wFF)))*w(s*w:(s(TF
pssniibb
(T)
VwpnT
bi
bi
return :6
)12
*)1(:2
*( :5
)(HAMMING :411FFT :3
:},,0{,each for :2:}1,,0{ ,each for and }1,0{ ,each for :1
SPECGRAM
function windowspectral Hamming HAMMING() tracesEM of mspectrogra
window timea within points timeofnumber windows timeofnumber
tracesEM ofnumber tracesEM
←−+
•←−+←
∈−∈∈
======
K
K
5.4 Attack Strategy on Known Point Operation
Algorithms such as double-and-always-add [Cor99] and Montgomery ladder [Mo97]
have a consistent sequence of point operations. While this prevents simple analysis, it
enhances differential analysis as the whereabouts of point operations can be identified
and the attacker can focus on the double operations. A differential analysis on the (n+1)th
ECCDbl operation using the MSB of an input coordinate for trace splitting can reveal the
nth key bit. The attacker can use this strategy to iteratively recover the entire key.
5.5 Proposed Attack Strategy on Unknown Point Operation
Jacobi form and Hesse form elliptic curves [LS01] [JQ01] use the same algorithms for
point addition and doubling. Not only does this approach prevent simple analysis, it also
has the benefits of making differential analysis more difficult. In this approach, an
attacker can only perform differential analysis on double operations as additional
37
operations are not done for key bits of zero. Given that an attacker knows the mth
operation to be a double operation for the nth key bit, the attacker can guess that nth key
bit is one and therefore assumes (m+2)th operation is a double operation for the (n+1)th
key bit. The attacker can perform differential analysis on the (m+2)th operation to verify
the nth key bit is 1. However it is not logical to perform the differential analysis on the
(m+1)th operation, since the input is the same as the output of mth operation and it is the
same regardless of the nth key bit value.
The attacker can apply this strategy iteratively starting from the first operation
which is known to be the double operation for the first key bit, a shown in figure 5-7.
Given mth operation is ECCDBL for mth bit,
Q is the output of ECCDBL and P is the base point
If nth bit is 1If nth bit is 0
(m+1)th
operation is ECCDBL; input is Q
(m+2)th
operation is unknown; input is 2Q
(m+1)th
operation is ECCADD; input is Q
(m+2)th
operation is ECCDBL;
Input is Q+P
Figure 5-7: Attack Strategy on Unknown Point Operation
5.6 Proposed Attack Strategy on Window Method
Elliptic curve scalar multiplication can be more efficiently implemented with a window-
based method, which operates on a scalar multiplier in base 2w for some windows size w
greater than one. The window method works in a similar way as the standard scalar
multiplication, except the point operations are applied on one scalar digit di of w bits at
each iteration of the algorithm. The multiplication of a digit di by the base point is pre-
computed and stored in a lookup table.
38
The window method is vulnerable to differential analysis as an attacker can
capture side channel trace from the double operation and partition the signal traces with
the MSB of the input point. However the attack is made more difficult as the input of the
double operation depends on w key bits, and the attacker must try all 2w possible values
of key bits to find the correct bit values that produce the strongest differential signals.
0102030405060708090
0x00 0x01 0x10 0x11
Figure 5-8: Differential Signal for Different Key Bits
The figure 5-8 shows the differential signal resulting from different key bit guesses;
where w is 2 in this window method. Clearly, the key bits 0x10 leads the highest
differential signal, and these bits are most likely to be correct.
39
6 Proposed Methodology of SEMA
The goal of SEMA is to recover the secret scalar by identifying a sequence of ECC
addition and double operations. This chapter describes an innovative approach of
classifying the point operations from EM signals using AI neural networks programming
paradigm. Neural networks are used in many intelligent voice recognition and image
recognition systems. Other applications of neural networks include detection of medical
phenomena, stock market prediction, and engine management [S96]. However this is the
first known use of AI neural networks for cryptanalysis.
Neural networks are effective at solving difficult problems where there are no
algorithmic solutions or simple mathematical structures in the input data; [S96] and
recognition of complex EM signals belong this class of difficult problems. In contrast,
the template attack assumes noise can be modeled with a Gaussian model and classify the
operations using optimal signal detection and estimation theory. Thus far, this approach
is assumed to be the most optimal way to classify operations.
To evaluate the effectiveness of the neural network approach, it is being compared
to the template attack approach, and this forms the basis for using neural network
discussed in the first section of this chapter. The second section provides an overview of
neural network structures. The remainder sections describe the preprocessing, training
and classification procedures in neural network systems.
6.1 Motivation of using Neural Network
The template composes of signal and noise models of an operation, and is developed
from signals acquired over many executions of an operation. The signal component is the
average signal, and the noise component is multivariate Gaussian model of noise.
However, a template is a poor representation of an EM signal when there are large
amount of jitters (random time shifts) in the signals, as illustrated in figures below.
40
Signals from Operation A
0
0.2
0.4
0.6
0.8
1
1.2
1 2 3 4 5 6 7 8 9 10
Time
Ampl
itude Signal 1
Signal 2Average
Figure 6-1: Signal Component of Operation A Template
Signals from Operation B
0
0.2
0.4
0.6
0.8
1
1.2
1 2 3 4 5 6 7 8 9 10
Time
Am
plitu
de Signal 1Signal 2Average
Figure 6-2: Signal Component of Operation B Template
Each figure shows two signal samples of the same operation along with signal
average. It is obvious the two samples are identical except one is a time shift of another.
The signals from operation A are clearly different from those of operation B; in
particular, those from operation A have lower frequency and have durations of time when
signals are non-changing.
41
However, the signal averages or the signal components of these templates are
identical. The noise components due to jitters would not provide any useful identification
information either. Therefore, the template approach of modeling signals with jitters
would fail.
In general, a template model performs poorly whenever it encounters EM signal
samples that are not simple linear combination of fundamental EM signal and Gaussian
noise. The EM signal produced from a mobile device is not consistent at each execution
due to software runtime environment (section 7.2) and trigger signal (section 7.4). Hence
the noise can be thought of as some random non-linear transformation of the fundamental
EM signal.
It is conceivable that a very elaborate and complicated model can be developed
for the EM signal produced from additional and double operation. The construction of
such a model would also require a detailed model of the PDA run-time environment and
the trigger signal. A model of the trigger signal is relatively easy to create. However, a
model of the PDA run-time environment is extremely challenging to construct.
Therefore, there is no simple model to describe the EM signals generating from the
additional and double operations. In this thesis, the proposed methodology uses AI
neural network commonly deployed in speech and image recognition to classify addition
and double operations from EM signals.
Neural network is modeled after the biological neural system in our brains [B95].
Real brains, however, are orders of magnitude more complex than any existing artificial
neural networks. There are two main characteristics of neural networks that make them
suitable for this research problem. First of all, neural networks are capable of modeling
extremely complex and non-linear functions [R96]. Linear modeling is often used in
engineering as they have well-known optimization strategies. However, in situations
when linear approximations are not valid, linear models fail. Secondly, neural networks
are relatively easy to implement. They are capable of learning how to do their tasks
given the right training data. The algorithms and data structures of neural networks are
portable across vastly different applications.
There are two general classes of neural networks: unsupervised networks and
supervised networks. In an unsupervised network, the network adapts purely in response
42
to its inputs. Such networks can learn to pick out structure in their input. They are useful
for data clustering and reduction of dimensionality of data [B95]. For this application,
the supervised networks are more useful. A supervised network is trained with sample
input and desired result from the point operations.
The neural network can accept only a limited amount of input. However, each
acquisition of EM signal has tens of thousands of samples points with considerable
redundancy. The solution is to preprocess the data by using signal envelope and data
compression. The preprocessed data can be then be used for training and classification
with the neural network.
6.2 Neural Network Structure
A neural network consists of many simple computational units called neurons that are
organized in layers, as shown in figure 6-3 below.
Figure 6-3: Neural Network Structure [S96]
An artificial neuron is similar to its biological counterpart; it receives a number of
(scalar) inputs and produces an output. The neurons are connected in the sense the output
of one layer of neurons are fed to the input of the next layer of neurons. More precisely,
a neuron of layer i sends its output to every neuron of layer i+1; conversely, a neuron of
layer i+1 receives an output from every neuron of layer i.
43
Each connection between neurons is associated with a weight, and each neuron
has a bias value. The activation input value is formed from the weighted sum of the
inputs plus the bias. The scalar output of the neuron is a transfer function of the
activation input. This computation of a layer of neurons is shown mathematically as
follows:
)( BWPfA += where P = input vector from layer i-1 A = output vector of layer i W = connection weight matrix between layer i-1 and i B = neuron bias vector of layer i
The activation function that is used is a Tan-Sigmoid transfer function, as shown
below. [MW]
Figure 6-4: Tan-Sigmoid Transfer Function [MK]
A single neuron can accomplish very little. However, a large number of neurons
organized in a multi-layer network can have great power.
The neuron network consists of four layers of neurons, as in typical neural
networks for pattern recognition. The neurons at the first layer (input layer) receive the
preprocessed EM signals as their input. There are two hidden layers. The last layer
(output layer) has only one neuron. If the output of this neuron is a positive number, the
signal is classified as a double operation. If the output is a negative number, it is
classified as an addition operation. The optimal number of neurons for other layers are
44
found experimentally, and the procedure and outcome for this are described in section 9.
In general, a network with more neurons is better able to model more complex patterns.
However, a network with more neurons requires more training data and is also more
computationally demanding.
6.3 Preprocessing
There are basically three steps of preprocessing of the input data: taking signal envelope,
scaling of signal envelope, and compression of the scaled envelope. There are two goals
of input preprocessing. The first goal is to reduce the number of inputs, which reduce the
complexity of the neural network and the number of training data required. The second
goal is to improve the recognition quality.
Signal Envelope
-3
-2
-1
0
1
2
3
4
1 2 3 4 5 6 7 8 9 10 11 12
Time
Am
plitu
de
OriginalEnvelope
Figure 6-5: Signal Envelope
An envelope of a sample EM signal is shown in figure 6-5. Each segment in the
envelope represents the maximal value of a four point window of the original signal.
This effectively reduces the size to a quarter of its original size. Although the use of
envelope reduces the information content of the original signal, local signal fluctuation
does not contribute to recognition quality as EM voltage signal naturally fluctuates
45
between positive and negative voltage values. One may be tempted to avoid the use of
envelope and instead reduces the resolution appropriately. However, the use of envelope
is important as it provides approximate positive amplitude of a signal. Moreover, it is not
a good idea to use average value instead of maximal value as the average signal value
would be approximately zero for EM signals.
The second step is to scale the training data such that the network input are
normalize to zero means and unity standard deviation. This is shown in the example
below. Each column represents a signal, and each row of a column represents a signal
point.
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
−+++−+−+−−+
⇒
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
−++++−+−+
041.171.071.016.139.039.016.178.026.031.178.0
021120111011
The means and standard deviations of the original inputs are stored and used to
transform future inputs to the network. This is stated mathematically as follows.
Let M = mean vector of the training input S = reciprocal of standard deviation vector of the training input B = new input vector A = new input vector after scaling
TSMBA )( −=
Although this process does not reduce the data size, normalization is vital to
increasing recognition accuracy. This is partly due to the activation function used in
neuron which is sensitive to inputs between -1 and 1.
The final step compresses the input data using principal component analysis.
Principle components define a projection that encapsulates the maximum amount of
variation in a dataset and is orthogonal to previous principle component of the same
dataset. [YR01] It is useful in applications where the dimension of the input vector is
large but the components of the vectors are highly correlated. This applied to EM signal
46
vector; each signal vector contains tens of thousands of signals, however there are
considerable correlations between points in the signal vector. Principle component
analysis is used to reduce the redundancy in the input vector.
There are two steps to principle component analysis. The first step is to
orthogonalize the components of the input vector such that they become uncorrelated, as
illustrated in the transformation below.
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
−−−
⇒
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
−−−−−
38.146.065.027.0041.171.071.028.009.020.138.1
041.171.071.016.139.039.016.178.026.031.178.0
Each column still represents a signal. However, each row represents an orthogonal
component. The rows are ordered in a way so that the orthogonal components (principal
components) with the largest variation appear at the top rows. The final set eliminates
the orthogonal components that contribute less than a predefined amount of total
variation in the set. This is illustrated in the transformation below, with the orthogonal
components that contribute less than 30% eliminated. The last row has been eliminated.
This reduces the amount of input data by eliminating the components that have the least
amount of information.
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
−
−
⇒
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
−−−
041.171.071.0
28.009.020.138.1
38.146.065.027.0041.171.071.028.009.020.138.1
47
6.4 Training
The type of neural network used for this application is backpropagation network.
Backpropagation is a gradient descent algorithm, which adjusts the network weights
along the negative of the performance function, as illustrated in figure 6-6. [ANC] The
term backpropagation refers to the matter in which the gradient is computed for non-
linear multi-layer networks.
Figure 6-6: Gradient Decent Algorithm [ANC]
The training of a neural network requires a set of input vectors and a target vector.
In this application, the input vectors are the preprocessed EM signals. The target vector
comprises of ones and negative ones, representing double and addition operations
respectively.
The standard steepest decent algorithm is often too ineffective for most
applications. There are two main categories of improvements: the first category uses
heuristics techniques developed from analysis of the performance of the standard steepest
descent algorithm. One heuristic is to consider the momentum of the gradient descent;
this enables the network to respond not only to the local gradient, but also recent trends
on the error surface. This prevents the network from getting stuck in a shallow local
minimum on the error surface, and result in a locally optimal (but suboptimal) solution.
[MW] The second category uses numerical optimization techniques for neural network
48
training. The training techniques that are tested experimentally are CGB, CGP, GDX,
SCG and RP, and they are described in [MW].
The distribution of training data should reflect the distribution of actual test data.
For example, in scalar multiplication based of binary expansion of scalar, the double
operation occurs approximately twice as likely as the addition operation. Therefore, 1/3
of training data should come from addition and 2/3 should come from double operation.
The main reason for using the same distribution is to ensure the means and standard
deviation of the training data are approximately the same as testing data, which is
important for proper preprocessing. The second reason is that it is more advantageous for
the network to decide on a more probable outcome when it encounters an ambiguous
input. In this case, an ambiguous input is an input signal that does not fit with the usual
patterns of either addition or double operation, and hence difficult to classify. When the
network is trained with more data from double operation, the network is more likely to
classify the source of an ambiguous signal as double operation.
6.5 Classification
The preprocessed EM signal is fed into the input layer of the neural network. The values
are propagated through the layers of the networks until the output layer. The output layer
has only one neuron, and the range of this output is between +1 and -1. If the output is a
positive number, the source of EM signal is classified to be double operation. Otherwise,
it is classified as addition operation.
6.6 Combination of Classification Results
The classification results are numbers between +1 and -1. An adversary can greatly
enhance the classification accuracy if the adversary can execute this multiplication
algorithm with the same scalar more than once, and combine the classification results
49
over all executions by averaging. As more executions are used, the average classification
result becomes more consistent and accurate. This is shown mathematically.
X = random variable of classification result Xi = random variable of classification result on the ith iteration
nXVAR
XVARXVARXVARn
XVARXVARXVARn
XXXVARn
nXXX
VAR
n
n
n
)(
))(...)()((1
))(...)()((1
)...(1
)...
(
2
212
212
21
=
+++=
+++=
+++=
+++
The variance of the classification results varies inversely with the number of
samples. This can be demonstrated graphically in figure 6-7. The figure shows the
classification result distribution for a double operation, whose expected classification
value should be between 0 to +1. Combining results over many executions, the
distribution is concentrated closely to the mean value, and the probability of having a
result less than zero (false classification) is low. However, using the result of one
classification, the result distribution is spread over a wide range. There is a significant
probability that the result is less than zero.
50
Classification Result Dist
0
5
10
15
20
25
30
35
40
45
-0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Result
Pro
b % Many Iter
One Iter
Figure 6-7: Effect of Combination of Classification Results
6.7 Integer Optimization Model
An integer programming model can be used in conjunction with result combination to
enhance the neural network classification system. There are two pieces of information
that an adversary can exploit. First of all, an add operation is always followed by a
double operation. Secondly, the number of double operations equals to the number of
scalar bits. These can be formulated as constrains in an integer programming model.
The inputs to this programming model are the classification results, variables
between -1 and +1, for the point operations in the algorithm. The absolute value of this
variable represents certainty; hence, one is more certain that it is an add operation if the
result is -0.9 rather than -0.5. The certainty information is factored into the programming
model.
51
The variables are the binary assignment variables; this variable is 1 if an operation
is assigned as double, -1 otherwise. The objective function of the model is to maximize
the total certainty of the assignments, or stated mathematically as follows.
}{∑= yxx max where x is the assignment vector and y is the classification result vector
The GAMS model is shown in Appendix C. Intuitively, the programming model
picks out the most likely sequence of operations that is feasible for this multiplication
algorithm. The table below shows the integer optimization model may correct errors in
classification.
Table 6-1: Error Correction with Integer Optimization Model 1 2 3 4 5
Actual Op. Dbl Add Dbl Dbl Add
Class. Value +0.9 -0.7 -0.2 +0.8 +0.5
Class. Result Dbl Add Add Dbl Dbl
Int. Op. Result Dbl Add Dbl Dbl Dbl
The entries in boldface represent incorrect classifications. The neural network
system makes a wrong classification for operation 3 and 4. The integer optimization
model is able to correct the error at operation 3 since this violates the model constraint
that an addition operation cannot be followed by another addition operation, and
classifying that operation as a double operation is the most optimized solution. However,
the integer optimization model fails to correct the error on operation 5 as that error does
not violate any model constraints.
52
7 Experimental Setup and Methodology
The chapter describes the experimental setup and methodology that are generally
applicable to SEMA and DEMA against ECC computations on PDA devices. The
chapter begins by describing the target hardware and software platform used for the
experiments. It will then describe the implementation of ECC target system used for
attacks. The remainder of the chapter describes the measurement setup and oscilloscope
configurations.
7.1 Target Hardware Platform
The target hardware is a PDA having cell phone, Internet and email capabilities. This
platform is chosen as it is a widely-used in the industry and serves multiple purposes. To
protect confidentiality, the make of this PDA is omitted from this thesis. Furthermore,
the techniques and analysis illustrated in this thesis is not limited to this particular
hardware platform.
The PDA operates at 56 MHz and is powered by a 32-bit Intel 386 processor. It
contains 16 MB flash memory and 2 MB SRAM. The back of this PDA is removed to
expose the processor to EM signal probing.
7.2 Target Software Platform
The target hardware supports the Java 2, Micro Edition (J2ME) runtime environment,
which is an optimized environment designed to enable Java applications to run on small
computing devices such as PDA. The maker of this PDA provided the software
development kit (SDK) for programmers to develop third party Java applications to run
on this device.
The target ECC program is written in Java. Implementation of ECC program in
Java for this research work presents a unique challenge. The Java programming language
53
is different from other languages in that a program is both compiled and interpreted. The
program needs to be compiled to an intermediate language called Java bytecodes. During
runtime, the bytecodes are interpreted by the Java Virtual Machine (JVM) and translated
into binary instructions to be executed on the PDA’s CPU, as illustrated in figure 7-1.
Figure 7-1: Java Runtime Environment
Side channel analysis relies on the correlation of side channel signals with the
data and code being executed. However, during execution of a Java cryptographic
program, a very large portion of the computation time is spent interpreting the Java
program rather than executing the program. Therefore a large portion of the side channel
signals, either power or EM, have no correlation with the program data and code being
executed. The JVM effectively creates a very noisy environment. Furthermore, the OS
on the PDA may perform memory management, handle interrupts, and context switching
during a signal acquisition. This would introduce a tremendous amount of distortion in
EM signals that are captured. These factors make side channel analysis very difficult.
Another more subtle problem is that a Java program takes a much longer execution time,
and this requires a longer acquisition interval. With limited memory in digital
oscilloscope, this often means the use of lower sampling frequency.
7.3 ECC Program Implementation
The program is not a complete cryptography system capable of encryption, decryption,
authentication, etc. Instead only the dominant operation in ECC, scalar multiplication, is
implemented.
54
Before impTlementing the ECC program, several choices have to be made
regarding the selection of elliptic curve domain parameters. These include the choice of
underlying finite field, field representation, and elliptic curve. These choices in turn
influence the selection of algorithms of field arithmetic and elliptic curve arithmetic
[BHL00].
For implementation on handheld computing device, the use of elliptic curves
defined over prime fields, GF(p), can yield better performance as general processors
often provide optimized hardware for integer computations. However, the use of elliptic
curves over binary fields, GF(2n), require many bit-wise operations that are hard to
implement in software and perform poorly on general processors.
An elliptic curve E of Fp is specified by the coefficients pFba ∈, of its defining
equation baxxy ++= 22 . The NIST curves all have a = -3 because this yields a faster
algorithm for point doubling when using Jacobian coordinates. The number of points on
E defined over Fp is nh, where n is prime and h is co-factor. A random curve over Fp,
where p is an m-bit prime, is denoted by P-m.
A NIST-recommended ECC prime field curve p-192 [NIST] is used in this
implementation. The parameters of this curve are shown in table 7-1. There are many
advantages of using NIST curves. First of all, the NIST recommended curves are
designed to be computationally efficient on general processors. In particular, the NIST
primes have low Hamming weight in binary representations, and all the ones’ positions
occur at multiples of 32. This permits very fast modular reduction algorithms in
software. Secondly, NIST provides the parameters (a, b) for the curves that are studied
extensively and known to be secure from all common attacks. As well, it provides a
suitable reduction primitive polynomial for the underlying field that allows for efficient
implementation. Furthermore, it provides the order of curve, which is the number of
points on the curve and is very difficult to compute. The order of a curve is important for
some power analysis countermeasures and testing purposes. Finally, NIST also provides
a base point on the curve, which is also hard to find.
Of all the NIST recommended curves over prime field, the p-192 curve has the
least number of bits in the family of prime field curves, and is most suitable for resource-
constrained devices such as PDA.
55
Table 7-1: Elliptic Curve P-192 Parameters [NIST] Num of bit (m) 192
Prime order (n) __ $$$$$$$$ $$$$$$$$ $$$$$$$$ __' $"__ ___&!_&_ &_'__"__ __ $$$$$$$$ $$$$$$$$ $$$$$$$$ __' $"__ ___&!_&_ &_'__"__ 0x FFFFFFFF FFFFFFFF FFFFFFFF 99DEF836 146BC9B1 B4D22831
Co-factor (h) 1
Polynomial f(x) = x192 – x64 – 1
Parameter a -3
Parameter b 0x 64210519 E59C80E7 0FA7E9AB 72243049 FEB8DEEC C146B9B1
base point x 0x 188DA80E B03090F6 7CBF20EB 43A18800 F4FF0aFD 82FF1012
base point y 0x 07192B95 FFC8DA78 631011ED 6B24CDD5 73F977A1 1E794811
The coordinates of an elliptic curve point are elements of Fp, which are integers
between 0 and p-1. Assuming that m is the bit size of elements of Fp and each word is 32
bit long, the coordinates are represented as an array of m/32 words. The NIST primes are
chosen such as that m is always divisible by 32. The algorithms for field arithmetic used
in this implementation are given in [BHL02]. The API for the Java class for field
arithmetic is given in Appendix A.
The standard representation of points on an elliptic curve E is in affine
coordinates P = (x, y) and is given by the (affine) equation baxxy ++= 22 . However,
the use of affine coordinates requires finite field inversion, which is a very computational
expensive operation and requires 10 to 100 times more computation time than
multiplication. Therefore, it is advantageous to avoid inversions by representing points
using projective coordinates of which several types have been proposed. In standard
projective coordinates, the projective point 0),::( ≠ZZYX corresponds to affine point
),( ZYZX . The projective equation of the elliptic curve is 3232 bZaXZXZY ++= .
In Jacobian projective coordinates [CC87], the projective point 0),::( ≠ZZYX
corresponds to the affine point ),( 32 ZYZX and the projective equation of the curve is 6432 bZaXZXY ++= . Jacobian projective coordinates are used as they yield the best
overall performance. The algorithms for elliptic curve point addition and doubling are
56
given in [BHL02]. The API for the Java class for elliptic curve arithmetic is given in
Appendix B.
The base point and points in lookup table are given in affine coordinates, while all
intermediate points during scalar multiplication are represented in Jacobian coordinates.
The final result must be converted from Jacobian coordinates to affine coordinates.
The two point operations most often used are DblJJ and AddJAJ. DblJJ is a
double operation from a Jacobian input to a Jacobian output. AddJAJ is a add operation
for a Jacobian input, affine input and Jacobian output. The computational costs of
different point operations are specified in table 7-2. The target of DEMA experiment is
on the DblJJ operation in the scalar multiplication algorithm.
Table 7-2: Costs of point operations [BHL00]
Doubling General addition Mixed coordinates
2A A 1I, 2M, 2S A + A A 1I, 2M, 1S J + A J 8M, 3S
2P P 7M, 3S P + P P 12M, 2S J + C J 11M, 3S
2J J 4M, 4S J + J J 12M, 4S C + A C 8M, 3S
2C C 5M, 4S C + C C 11M, 3S
7.4 Measurement Setup and Technique
The EM emanation is received with an electric-field probe that is positioned on top of the
processor and attached to a coaxial cable. After wide-band amplification, the EM signals
are captured on a digital phosphor oscilloscope. The LED on the PDA is used as a trigger
signal. The LED is programmed to turn on at the beginning of an ECC operation of
interest, which triggers to the oscilloscope to capture one trace of EM. The measurement
setup is illustrated in figure 7-2.
57
Figure 7-2: Measurement Setup
The EM-6992 near probe set from Electro-Metrics is used for this experiment.
There are several competing choices for EM probes. There are two classes of EM
probes: magnetic field (H-field) probe and electric field (E-field) probe. The class of E-
field probes comprise of a ball probe and a stud probe. However, they completely fail to
capture EM signals from the processor.
The H-field probes are electrically small (i.e. resonant frequency above 1 GHz)
loops of varying sensitivities [EM6992]. The loops are wound within a balanced Faraday
shield that reduces their response to electric fields to a negligible factor. Each
successively large loop increases sensitivity (independent of frequency) by approximately
12 to 15 dB. Probes of lower sensitivity are better in isolating an emission source more
precisely. The magnetic probes are used in this experiment to capture EM signals
radiating from the PDA’s processor.
The H-field probe is placed directly on the processor to get the maximum amount
of EM signal, as shown in figure 7-3. The processor is a strong source of information-
dependant EM signals, and is easily accessible from the back of the PDA device. Other
locations, such as the memory module, may also produce strong EM signals. However,
since memory access is managed by the OS of the PDA, the memory access time is
unpredictable and different for each run of the ECC algorithm; hence, it is not a reliable
EM side channel.
58
Figure 7-3: EM Probe
It is very important that the distance between the probe and EM source is
minimized and consistent throughout all experiments. Even small changes in this spacing
can yield large variations in amplitude. In this experiment, the H-field probe is placed
directly on the processor, so the spacing is ensured to be always the same. In real attacks,
it may not be possible for an adversary to ensure the spacing is always the same.
However one may employ DSP techniques to remove variations in signal amplitude due
to variations in spacing.
The H-field probe that is chosen is the smallest one of the set, and it has 1 cm
loop. A smaller probe has lower sensitivity, which helps in isolating an emission source
more precisely. As well, the probe is small enough such that it fits nicely on top of the
processor which is roughly a square with length of 1 cm. A bigger probe may capture
noisy signals radiating from other devices. Although a smaller H-field probe has a lower
cut-off frequency, the cut-off frequency is much higher that what is needed in this
experiment.
The amplifier used is a broadband preamplifier (Electro-Metrics EM-6990)
connected to the EM probes. It provides a significant improvement in overall
measurement sensitivity of the typical spectrum analyzer. The preamplifier is inserted in
line between the H-probe and the digital oscilloscope.
The frequency range is between 5 kHz and 1200 MHz, with the cutoff frequency
at -3dB gain. [EM6990] The typical gain is 22 dB. In this experiment, the sampling rate
is between 10 to 50 MHz, which is well within the frequency range of the preamplifier
and H-field probe. The noise figure of the preamplifier is 6 dB.
The voltage across the terminals of LED on the PDA is used as trigger signal. In
software, the LED is programmed to turn on at the beginning of an operation of interest.
The rising edge of the LED signal would trigger the oscilloscope to begin capturing one
59
trace of EM signals. The reasons why the rising edge is used is that the rise time is much
less than the fall time of the LED signal. This allows the oscilloscope to begin capturing
EM signals as soon as possible. Just as importantly, there is less variations in the rise
time of the LED signal. This reduces the amount of jitters or random horizontal
movements of EM signals. However, there still exists a finite amount of rise time in the
LED signal, and this does causes some difficulties in EM analysis.
The EM signals are captured and stored with a digital oscilloscope. The
oscilloscope used is a TDS7254 Digital Phosphor Oscilloscopes from Tekronics [Tek].
There are many powerful features of this oscilloscope, and only those pertinent to this
experiment are described in this thesis. The configuration of the oscilloscope is described
in the following section.
7.5 Oscilloscope Configuration
The standard way of capturing signals is to acquire a single trace from one execution of
the algorithm. This may be used initially to find the duration of algorithm and
distinguishing features of EM signals from the algorithm. However, this is not useful for
DEMA or SEMA as those experiments often require several hundreds traces. Having
more traces enables a more consistent average signal and produces a more reliable result.
The number of sample points in each trace is set by the record length, and the number of
traces is set by frame count.
Moreover, the traces that are captured must be for consecutive executions of an
algorithm running within a loop. This is the only way to ensure the traces being captured
are for the correct set of controlled input value. This mode of capturing multiple
consecutive traces is called fast frame.
Each frame is captured at the rising edge of the trigger signal from the PDA’s
LED. However, due to noise, the oscilloscope occasionally fails to trigger properly from
the LED signal. It is important to check that the number of fast frames captured matches
to the number of times the algorithm is executed. Unfortunately, the oscilloscope cannot
detect over-triggering, or acquiring invalid frames due to noises in the trigger signals.
Once the oscilloscope acquired a preset number of frames, it would stop acquiring more
60
frames. One trick to overcome this difficulty is to set the frame count on the oscilloscope
higher than the number of times the algorithm is executed. Once the device has
completed execution of each iteration of the algorithm, one can stop the oscilloscope and
check the number of fast frames that are acquired.
There are two acquisition modes used in this experiment: the sample mode and
peak detect mode. The input signals are always sampled at each acquisition interval or
sampling period. In sample mode, the input signal is acquired at the beginning of each
acquisition interval. In peak detect mode, the greatest or lowest input signals is
alternatively captured at each acquisition interval. It turns out that peak detect mode is
excellent for DEMA over time domain signals, as only the peak signals are important.
However the sample mode must be used for frequency analysis in power spectrum
density and spectrogram. The theory behind discreet Fourier transform demands that the
time signals must be sampled at a regular interval. Therefore using other acquisitions
modes for frequency analysis would lead to incorrect results.
Due to the memory constrains of the oscilloscope, there is a trade off between the
number of fast frames and the number of sample points. The number of samples points in
turn depends on the frame duration and resolution. Given the frame duration is fixed to
the duration of the algorithm, an attacker can choose to have higher number of fast
frames or higher resolution. Higher number of fast frames can give more consistent
results. However, high resolutions can provide more details of the EM signals.
61
8 Experimental Results of DEMA
The focus of the DEMA experiment is to attack the ECC point double operation, and
thereby recovering the secret scalar in the multiplication algorithm. The first section
describes the experiments setup of DEMA. The second section provides results of
proposed trace splitting, and shows the MSB of input point coordinate is the best partition
bit. The next three sections show the DEMA results for time domain, power spectral
density and spectrogram signals. The final section compares the results of three signals
above.
8.1 Setup
The EM emanation is captured with 25K sample points at 12.5 MHz frequency over 2 ms
frame duration and 1300 fast frames. This is the maximum utilization of the oscilloscope
memory. The double operation takes about 18 ms, the scope is configured to extract the
EM signals from 2ms to 4ms of the double operation, where the most noticeable
differential signals are found. The use of 1300 fast frames gives very good consistency,
and the use of more fast frames does not appear to produce better experimental results.
Signals acquired with sample mode are used for frequency and spectrogram analysis.
Those from peak detect mode are used for time domain analysis.
DEMA requires partitioning the EM signals based on a partition bit value. Early
experiment applies two consecutive batches of inputs with the first batch having the bit
value of 0 and second batch having the bit value of 1. However it is found that any time
you group the outputs of the first n executions into the first set, and the outputs of the
second n executions into the second set, there is a significant group difference between
their outputs regardless of their bit values (false positive). This is because the average of
EM signals fluctuates slowly over time. This could be due to other EM sources in the
environment, or some other change in conditions within the device.
62
The second approach that also fails is to produce inputs with alternating values in
the partition bit. It is found that when one groups the odd executions (1,3,5,…) in set one
and even executions (2,4,6,..) in set two, there is also significant group difference
regardless of the bit value. The cause for such group difference is unclear. Perhaps, the
least significant bit of loop counter affects the signals being generated. The sets from
even and odd executions would have group difference regardless of the input bit value,
which lead to false positives.
The final solution is to generate the input using a pseudorandom generator and a
fixed seed. This is cumbersome as the analysis program must use the same
pseudorandom generator and seed to recover which bit value is used at each iteration of
the algorithm. However this is only way to ensure that there is no false positive in the
final result.
In DEMA one should check that the differential signal from a correct partition to be
much stronger than that of an incorrect partition. This is the only way to check the
effectiveness of DEMA. Incorrect partition is achieved by performing the analysis using
the wrong seed of the pseudorandom generator.
8.2 Results of Trace Splitting
Figure 8-5, 8-1 and 8-2 show the results of differential EM analysis with correct bit
partitioning using first, second and third most significant bit respectively. Clearly the
amplitude of differential signals diminishes significantly as the partitioning bit is further
away from the MSB. This is because the chance of carry-out is not as high if the 2nd or
3rd MSB is one, hence the probability of carry-out does not correlate as closely to bits
other than MSB. This demonstrates the impact of carry-out in sub-operations within
point doubling on the resulting EM signals, and shows that MSB is the most suitable
partition bit. Figure 8-11, 8-3 and 8-4 illustrate a similar situation with spectrogram.
They show the results of differential spectrogram analysis with correct bit partitioning
using first, second and third most significant bit respectively.
63
Figure 8-1: Differential signal for correct bit partitioning on 2nd MSB
Figure 8-2: Differential EM signal for correct partitioning on 3rd MSB
64
Figure 8-3: Differential EM spectro for correct partitioning on 2nd MSB
Figure 8-4: Differential EM spectro for correct partitioning on 3rd MSB
65
8.3 Results of Time Domain Analysis
Figure 8-5 shows the differential EM signal when a correct scalar bit is chosen in DEMA.
In contrast, figure 8-6 shows the same analysis when an incorrect scalar bit is chosen.
The 3 SD (Standard Deviation) and -3 SD are included as references in the plots. Signals
above and below the 3 SD and -3 SD curves respectively are considered to be significant.
The first figure features multiple significant peaks whereas the second figure shows no
peaks at all. The peaks are likely corresponding to the time of finite field computations
on the x-coordinate of the input point.
Figure 8-5: Differential EM signal of ECC double with correct guess
66
Figure 8-6: Differential EM signal of ECC double with incorrect guess
8.4 Results of Power Spectrum Density Analysis
As before, the 3 SD and -3 SD curves are used as references in power spectrum density
analysis. Figure 8-7 shows the differential PSD signal of ECC Double with correct bit
partitioning. Figure 8-8 shows the differential PSD signal with incorrect bit guess.
Clearly, no significant peaks are found in differential PSD signal with correct bit guess.
In fact, there are little discernable differences between figure 8-7 and 8-8.
However, once the PSD from EM signals between 0.6 ms and 1.2 ms is focued, a
significant differential peak at 4 MHz is found. Figure 8-9 and 8-10 show the differential
PSD signals for correct and incorrect bit partitioning respectively. This shows the PSD
differential signals feature peaks, but the peaks are evened out in figure 8-7 with a large
capture interval.
67
Figure 8-7: Differential EM PSD of ECC double with correct guess
Figure 8-8: Differential EM PSD of ECC double with incorrect guess
68
Figure 8-9: Differential PSD of ECC double with correct guess
Figure 8-10: Differential PSD of ECC double with incorrect guess
69
8.5 Results of Spectrogram Analysis
Figure 8-11 is the differential EM spectrogram for ECC double operation with a correct
scalar bit guess, whereas figure 8-12 is for an incorrect scalar bit guess. The 3 SD and -3
SD curves are again included in the plots to help distinguish peaks that are significant.
The SD curves always peak at 0 frequency, indicating that there are considerable
fluctuations of average EM signals over different traces. Clearly, figure 8-11 features
many significant peaks above the 3 SD curve and below the -3 SD curve. Furthermore,
peaks in figure 8-11 correlate with those in figure 8-5, such as those that appear at 0.7ms
and 1.1ms. This is expected as the differential EM signal and differential EM
spectrogram are simply two different perspectives of looking at the same events
unfolding on the PDA device.
Figures 8-13 and 8-14 show the frequency domain signals of a single time frame
at 0.6 ms for correct and incorrect scalar bit guess respectively. With correct scalar bit
guess, significant amount of signals are beyond the -3 SD curves.
Figure 8-11: Differential EM spectro of ECC double with correct guess
70
Figure 8-12: Diff EM spectro of ECC double with incorrect guess
Figure 8-13: A frame of differential EM spectro with correct guess
71
Figure 8-14: A frame of differential EM spectro with incorrect guess
8.6 Comparisons
Table 8-1 shows the greatest multiple of SD_DOM that is less than a given percentile of
peaks for each type of analysis. For example, column two lists the multiples for time
domain analysis with correct partitioning. The second row lists the multiples of
SD_DOM less than 10 percentile of peaks (10% greatest sample points). The last row
shows the ratio of the highest peak to SD_DOM.
Table 8-1: Greatest Multiples of SD_DOM below Peak Amplitude Percentile of Peaks
Time (correct)
Time (incorrect)
Frequency (correct)
Frequency (incorrect)
Spectro (correct)
Spectro (incorrect)
10 2.99 1.69 1.66 1.64 5.10 1.49 1 4.83 2.65 2.61 2.59 6.28 2.28 0.1 5.86 3.29 3.30 3.31 7.04 2.81 Highest 7.78 3.99 4.18 4.37 7.73 3.07
72
In time domain and spectrogram analysis, the multiples from correct partitioning
are significantly higher than those from incorrect partitioning. This indicates that
differential analysis was successful in time domain and spectrogram signals, as an
attacker can distinguish between correct and incorrect bit guesses.
In PSD, however, the multiples from correct partitioning are not significantly
higher than those from incorrect partitioning. This indicates that differential analysis is
difficult with PSD. If a particular time frame within the double operation is focused, one
can get much better differential signals with correct partitioning. In general, PSD is not
good for differential analysis as any local correlation between EM signal and data is
evened out when PSD is performed over a long interval of time.
Although both time domain and spectrogram analysis are good for differential
analysis, the spectrogram analysis is clearly better. The greatest multiples are
significantly higher for spectrogram than time domain analysis; this indicates there are
more distinguishable peaks in spectrogram signals with correct partitioning, and
differential analysis is easier with spectrogram analysis. For example, in time domain
analysis, the top 10 percentile of peaks are three times above the SD_DOM. However, in
spectrogram analysis, the top 10 percentile of peaks are five times above the reference.
Therefore spectrogram analysis is most useful for DEMA.
73
9 Experimental Results of SEMA
This chapter presents results that show the usefulness of neural network programming in
SEMA attacks by comparisons with template attacks. The attack target is the ECC scalar
multiplication algorithm based on binary expansion of scalar. The chapter presents
results on optimization of neural network parameters, optimization of template attack
parameters, point operation recognition accuracy, effect of integer optimization models,
and effect of using multiple traces. The Matlab code of the neural network for SEMA is
given in Appendix D.
9.1 Setup
The EM emanation is captured with 50K sample points at 2.5 MHz frequency over 20
ms. The double operation takes about 18 ms, and the addition operation takes about 32
ms. However, analyzing the first 20 ms of these operations is found to be sufficient to
distinguish these signals. The target algorithm is a 192-bit scalar multiplication, which
performs 192 double operations and up to 192 addition operations. At maximum
utilization of oscilloscope memory, it can acquire 656 fast frames. At each acquisition,
the scope is configured to acquire 200 frames from double operations, 100 frames from
addition operations, and undetermined number of frames from a scalar multiplication
using a random scalar. The first 300 frames are used as training data, and frames from
scalar multiplication are used for testing data.
In the experiment, training and testing data from three acquisitions are used. In this
way, 900 training data frames and testing data from three scalar multiplications are
obtained. This is needed to ensure the experimental results are consistent.
The figures below (Figure 9-1 to 9-4) show EM signals generated from double and
additional operations, along with their average EM signal. Clearly, the EM signals are
quite different even for the same operation. This is evident from the fact the average
signal over 150 executions is reduced significantly in amplitude. As well, since there are
74
so much difference between signals from the same operation, it is not easy to distinguish
signals between addition and double operations. However, distinguishing the operations
from the average signals is even more difficult. For this reason, even if an attacker can
fix the scalar used in scalar multiplication, it is not useful to perform SEMA from average
EM signals. Most of the distinguishing features are lost in the average EM signals.
Figure 9-1: EM Signal from ECC Double Operation #1
75
Figure 9-2: EM Signal from ECC Double Operation #2
Figure 9-3: EM Signal from ECC Addition Operation #1
76
Figure 9-4: EM Signal from ECC Addition Operation #2
9.2 Parameters of Neural Network
There are number of parameters whose appropriate values must be found experimentally.
These parameters values are for the preprocessing unit and neural network. Those for
preprocessing include envelope size and window size. Those for neural network include
the number of neurons in each layer of the network.
The goal is not to find the mathematical optimal values for those parameters, but
only to find reasonable good parameter values. The strategy is to use past research and
heuristics to choose a range of reasonable values for each parameter, try a subset of
values within the range to find the resulting classification error rates, and select the
parameter value that yields the least error rate. This process optimizes each parameter
independently, without considering the interdependency between parameter values.
77
9.3 Results of Neural Network using Time Domain signals
There are two parameter values for the preprocessing unit for a neural network using time
domain signals: envelope and minimum variance in principal analysis. A signal envelope
reduces the signal size and filters out regular fluctuations in the EM signals that do not
contribute to classification accuracy. However, if the signal envelope is too coarse, it
removes the important features from the spectrogram signal. The envelope size should be
divisible by the window size. The optimal envelope size is found experimentally to be
250 points, as shown in figure 9-5.
Accuracy vs. Envelope Size
0.00%
2.00%
4.00%
6.00%
8.00%
10.00%
12.00%
0 100 200 300 400 500 600
Envelope
Err
or
Figure 9-5: Plot of Accuracy vs. Envelope Size
The preprocessor performs PCA (Principal component analysis) and retains the
components that contribute more than a specified fraction of the total variation in the data
set. The optimal value of this fraction found to be 0.4%, shown in the figure below.
78
Accuracy vs. Min Fraction
0.00%
1.00%
2.00%
3.00%
4.00%
5.00%
6.00%
7.00%
8.00%
9.00%
0.00% 0.20% 0.40% 0.60% 0.80% 1.00% 1.20%
Min. Fraction
Erro
r
Figure 9-6: Plot of Accuracy vs. Min Fraction
The size of the neural network is the number of neurons on each layer. The last layer (or
output layer) has only one neuron, and its value give the final verdict on the classification
result. From past research, the number of neuron at each layer is in the order of the input
size; therefore the heuristic used is to try numbers that are multiples of the input size.
Furthermore, the same size is used for both hidden layers (layer 2 and layer 3).
The optimal number of neurons at the input layer and hidden layer should be
interdependent. Therefore, it is not possible to optimize these values by considering them
separately. The heuristic below is used to optimize the number of neurons in the input
layer and hidden layer.
Algorithm 9-1: Optimization of the number of neurons Let a1 = current number of neurons at input layer Let b1 = current number of neurons at hidden layer Let a0 = previous number of neurons at input layer Let b0 = previous number of neurons at hidden layer a1 := 1; b1 := 1; a0 := 0; b0 := 0 while ( a0 <> a1 or b0 <> b1) a0 := a1 Choose a1 using b1 with the least error b0 := b1 Choose b1 using a1 with the least error end while
79
The idea is to iteratively optimize the number of neurons in the input layer and hidden
layer using the previous found optimal values of hidden layer and input layer
respectively. This process continues until the optimization process does not change the
number of neurons in the input and hidden layers. The optimal size of input layer and
hidden layers are found to be 3 and 1 respectively. The effect of using other sizes in
input and hidden layers are shown in figures below.
Accuracy vs. Input Layer Size
0.00%
0.50%
1.00%
1.50%
2.00%
2.50%
3.00%
3.50%
0 1 2 3 4 5
Multiple of Input Size
Err
or
Figure 9-7: Plot of Accuracy vs. Input Layer Size
80
Accuracy vs. Hidden Layer Size
0.00%
0.50%
1.00%
1.50%
2.00%
2.50%
3.00%
3.50%
0 1 2 3 4 5
Multiple of Input Size
Err
or
Figure 9-8: Plot of Accuracy vs. Hidden Layer Size
There are a number of different algorithms for training a neural network. They
differ in the computational time. They also appear to produce different classification
error rate even though they are trained to produce MSE (Mean Square Error) of less than
10-10 in the training data, as shown in table 9-1. This shows that two sets of weights that
are adjusted to give the same amount of MSE in the training data can perform differently
with new test data. For instance, the network trained RP algorithm has significantly
higher classification error rates. The error rates of networks trained by other algorithms
are roughly comparable. The CGB algorithm appears to be slightly more superior, with
the least error rate and computational time. Computational time is not an important
concern in a SEMA attack. Computational time is more important in real-time
applications. The CPU time is relative to the number of clock ticks required in the CPU.
81
Table 9-1: CPU time and Error % for Different NN Training Algorithms Algorithm CGB CGP GDX SCG RP
CPU Time 8.09 8.40 280 11.7 2.98
Error % 2.64 2.77 2.72 2.83 4.11
The table below summarizes the best parameter values for a neural network system using
time domain signals.
Table 9-2: Optimal parameter values of a NN using Time Domain Signals Parameter Envelop
Size Min. Fraction
Input Layer Hidden Layer
Training Algorithm
Value 250 0.4% 3x 1x CGB
The resulting classification error rate is 2.64%. The classification error rate on training
data is 0%. The error rate is calculated from classification on 9930 point operations.
9.4 Results of Neural Network using Spectrogram Signals
There are four parameter values for the preprocessing unit for a neural network using
spectrogram signals: window size, overlap size, envelope, and minimum variance in
principal analysis. The parameter value for the neural network design includes the
number of neurons at each layer and the algorithm used to train the neural network.
Spectrogram performs frequency analysis on windows of signals, and the window
size is the number of time sample points in each window. A larger window would
capture higher frequency resolution, but lower time resolution. On the other hand, a
smaller window would capture lower frequency resolution, but higher time resolution.
For instance, if the EM signal pattern changes rapidly, the use of smaller window is
needed to capture the EM signal at a shorter time intervals. On the other hand, if the
information is contained in a large frequency range of the EM signals, the use of larger
window is needed to capture a wide-band of EM signals. The window size should be
82
divisible by the total number of samples. The optimal window size is found
experimentally to be 2500 points, as shown in figure 9-9.
Accuracy vs. Window Size
0.00%
0.50%
1.00%
1.50%
2.00%
2.50%
3.00%
0 1000 2000 3000 4000 5000 6000
Window Size
Erro
r
Figure 9-9: Plot of Accuracy vs. Window Size
The spectrogram windows should overlap so that signal patterns near the edge of the
windows would not be lost. The extent of the overlap depends on the input signals and
can be found experimentally. With the window size fixed at 2500 sample points, the
optimal overlap is 1000 points, as shown in figure 9-10.
83
Accuracy vs. Overlap Size
2.05%
2.10%
2.15%
2.20%
2.25%
2.30%
0 500 1000 1500 2000 2500
Overlap Size
Erro
r
Figure 9-10: Plot of Accuracy vs. Overlap Size
A signal envelope plays an important role in improving the input to the neural network.
The envelope size should be divisible by the window size. The most appropriate
envelope size is found experimentally to be 50 points, as shown in figure 9-11.
Accuracy vs. Envelope Size
0.00%0.50%1.00%1.50%2.00%2.50%3.00%3.50%4.00%
0 50 100 150 200
Envelope Size
Erro
r
Figure 9-11: Plot of Accuracy vs. Envelope Size
The preprocessor performs PCA and retains the components that contribute more than a
specified fraction of the total variation in the data set. The optimal value of this fraction
84
is found to be 0.4%, which is same as the result found for time domain signals. The error
rates for different fractional values are shown in figure 9-12.
Accuracy vs. Min Fraction
0.00%
0.50%
1.00%
1.50%
2.00%
2.50%
0.00% 0.20% 0.40% 0.60% 0.80% 1.00%
Min. Fraction
Erro
r
Figure 9-12: Plot of Accuracy vs. Min. Fraction
The size of the neural network is the number of neurons on each layer. Using the same
heuristic, the optimal numbers of neurons at the input and hidden layers are 3 and 1
respectively. These numbers are the same as those found for time domain signals. The
effect of changing the size of input and hidden layers are shown in the figures below.
85
Accuracy vs. Input Layer Size
1.88%
1.90%
1.92%
1.94%
1.96%
1.98%
2.00%
2.02%
2.04%
2.06%
0 1 2 3 4 5
Multiple of Input Size
Erro
r
Figure 9-13: Plot of Accuracy vs. Input Layer Size
Accuracy vs. Hidden Layer Size
0.00%
0.50%
1.00%
1.50%
2.00%
2.50%
3.00%
0 0.5 1 1.5 2 2.5 3 3.5
Multiple of Input Size
Erro
r
Figure 9-14: Plot of Accuracy vs. Hidden Layer Size
There are a number of different algorithms for training a neural network. Their results
are shown in the table below. Beside the RP training algorithm, all the other training
algorithms are able to train a neural network that achieves an error rate of about 2%. The
86
SCG algorithm appears to be superior, as it provides the least error rate and
computational time.
Table 9-3: CPU time and Error % for Different NN Training Algorithms Algorithm CGB CGP GDX SCG RP
CPU Time 14.9 6.98 45.8 14.2 5.45
Error % 1.89 2.00 1.98 1.87 3.89
The table below summarizes the best parameter values for a neural network system using
spectrogram signals.
Table 9-4: Optimal parameter values of a NN using Spectrogram Signals Parameter Window
Size Overlap Size
EnvelopeSize
Min. Fraction
Input Layer
Hidden Layer
Training Algorithm
Value 2500 1000 50 0.4% 3x 1x SCG
The least error rate with the parameters above is 1.87%. The error rate when the neural
network is tested with training data is 0%. The error rate is calculated from classification
on 9930 operations.
9.5 Results of Template Attack
As in neural network, training data are needed to create template for the different point
operations: additional and double operations. The training and testing data also need to
be preprocessed to reduce their dimension and remove the unnecessary signals that do not
contribute to classification accuracy.
The preprocessing unit consists of two steps. The first step is create an envelope
of the input signal, and the second step is to select points that have the greatest
differences in means between the two operations.
After training data is enveloped, they are grouped into two sets for addition and
double operations. The standard deviation of difference of means (SD_DOM) is found
for the two sets of signals. The signal points in SD_DOM that have the highest amplitude
87
correspond to sample points with the largest variations between different operations.
These signal points are useful for classification, and they are selected to form the
observation matrices for the two operations. Each column of the observation matrix
represents a sample of the experiment; and each row represents a sampling point.
A template comprises of a signal component and noise component. The average
of an observation matrix forms the signal component of a template. A noise vector is
calculated as the difference between a column of the observation matrix and the average
signal. The noise covariance matrix is computed from the noise matrix, and that is the
noise component of the template.
Using the template, one can calculate the probability of observing a noise vector.
Given a signal from an unknown operation, one can make a classification decision by
finding the probability of observing the noise vector assuming the signal is from addition
or double operations. For example, if the probability of observing the given signal from
addition operation is higher, the source is classified as addition operation.
Figure 9-15 shows how the classification accuracy varies with the size of the
observation column vector, for different envelope size. The bigger observation vector
stores more sample points from the input signal. Apparently, the error rate does not
necessarily decrease with bigger observation vector. An excessively large vector would
introduce too much useless information to the template, which may cause more
classification errors. The best configuration is using an envelope size of 500 points and
70 rows in the observation column vector. This configuration gives an error rate of
18.8%.
88
Accuracy vs. Obs Size
0.00%
5.00%
10.00%
15.00%
20.00%
25.00%
30.00%
35.00%
40.00%
45.00%
10 20 30 40 50 60 70 80
Obs Size
Erro
r 200250500
Figure 9-15: Plot of Accuracy vs. Observation Size
A sanity check of these classification systems is to check the performance from
classifying training data (data that the system has seen before). The classification error
on training data is expected to be much more accurate. A system that has good
classification accuracy on training data is said to have good memory.
The quality of memory is clearly proportional to the size of the observation
column vector, as shown in figure 9-16. In fact, an observation column vector with 100
entries or more can have perfect or near perfect memory. However, a vector of this size
usually performs poorly with new testing data. This shows a classification system with
better memory is not necessarily better at classifying new data.
89
Memory vs. Obs Size
0.00%
5.00%
10.00%
15.00%
20.00%
25.00%
30.00%
35.00%
40.00%
10 20 30 40 50
Obs Size
Erro
r
2002505001000
Figure 9-16: Plot of Memory vs. Observation Size
9.6 Results of Averaging and Integer Optimization Model
The classification accuracy can increase if the classification results over many executions
are combined by averaging. As more executions are used, the average classification
result becomes more consistent and accurate.
The classification error rates using different classification systems in conjunction
with averaging and integer optimization technique are shown in the table below.
Table 9-5: % Error rate of different algorithms Template Attack Neural Network
with Time Domain Neural Network with Spectrogram
# of Execution
w/o IP w/ IP w/o IP w/ IP w/o IP w/ IP 1 18.8 7.31 2.64 1.02 1.87 0.76 2 15.6 6.88 0.86 0.65 0.61 0.41 3 13.9 3.13 0.71 0.25 0.52 0.32
90
Figure 9-17 shows the classification accuracy using a neural network on
spectrogram signals, with respect to the number of executions and whether integer
optimization is used for error correction.
A few trends can be observed from the experimental data. First, the error rate
decreases as results from more executions are used. However, the rate of this decrease
slows down with more executions. Secondly, the classification accuracy is significantly
better with an integer programming model. This indicates the integer programming
model is able to correct some classification errors. However, the effectiveness of the
corrections also diminishes as classification approaches perfect accuracy.
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
Erro
r %
1 2 3
Number of Executions
Accuracy vs. Number of Executions
w/o IPw/ IP
Figure 9-17: Plot of Accuracy vs. Number of Executions
The final goal of SEMA is to completely recover a scalar. For a 196 bit scalar, the
probability of total scalar recovery for the neural network system using spectrogram
signals is shown in figure below. Although the neural network system using spectrogram
can achieve a very low bit classification error rate of 1.87%, the success rate of total
recovery of the scalar is only about 2%. However with 3 executions, the success rate
increases to about 36%. Figure 9-18 also shows the positive effect of using integer
optimization model for error correction. For the one execution case, the use of an IP
91
model increase accuracy from 2% to 22%. In the three execution case, the use of an IP
model increase accuracy from 36% to 53%.
0%
10%
20%
30%
40%
50%
60%
Succ
ess
%
1 2 3
Number of Executions
Accuracy vs. Number of Executions
w/o IPw/ IP
Figure 9-18: Plot of Accuracy vs. Number of Executions
9.7 Comparisons
In this section, the experimental results of template and neural network attacks are
examined. It is found that the approach from signal detection theory as in template attack
to classify point operation is ineffective. The best error rate using this approach is only
about 18.8%. However, using a neural network system can achieve an error rate of
2.64%. In addition, the use of spectrogram signals can further decrease this error rate to
1.87%.
As impressive as this may be, a bit classification error rate of 1.87% is still very
high for a complete recovery of a scalar with 196 bits or more. Two other approaches are
to combine classification results over many executions or use an integer optimization
approach for error corrections. These approaches are found to be very effective at
increasing the overall success rate of SEMA.
92
10 Discussion and Conclusions
10.1 Limitation of Research and Implementation
Due the memory constrains of the digital oscilloscope, a tradeoff must be made between
the number of fast frames and the number of sample points. The use of more fast frames
would yield more consistent result for DEMA and more training data for SEMA. The use
of more sample points can yield greater sampling frequency and better results for both
DEMA and SEMA.
In the experiments, approximately 1000 acquisitions were used to ensure
acceptable consistency in DEMA and sufficient amount of training data in SEMA. The
sampling frequency is set to the minimum frequency that provides sufficient details for
SEMA and DEMA. This often requires using frame duration shorter than the duration of
the operation, due to the limited amount of sample points. For example, in DEMA of the
double operation, the capture duration is only 2ms whereas the entire double operation is
20 ms. Only results from the section with the highest differential signal were shown.
It is possible to merge the results from more than one experiment, for example the
data for SEMA consists of acquisitions from three experiments. However the maximum
number of combined acquisitions is still constrained by the memory available on the
computer that processes the acquisitions.
A valid criticism of the SEMA and DEMA experiments is that they focused solely
on the basic ECC scalar multiplication algorithm based on binary expansion of the input
scalar, and computed over prime field GF(p). Most optimized implementations use the
window algorithm for scalar multiplication. However, the methodology proposed for
SEMA and DEMA here also work for a sliding window algorithm, although no
experimental data are shown here.
All experiments are performed on our own implementations of ECC
computations. The target PDA only supports Java third party applications, and
consequently the ECC computations are implemented in Java. As well, a trigger signal is
generated before each ECC computation. This makes the EM analysis attacks much
93
simpler. In a complete attack, the attacker needs to identify the start point of the point
operations to be analyzed.
Finally this research is not focused on the neural network programming paradigm
in artificial intelligence. Limited research effort is devoted in improving the algorithms
for neural network; instead, the most typical neural network algorithms and
configurations are used in the SEMA experiments.
10.2 Summary
The purpose of this thesis is to describe new methodologies of SEMA and DEMA against
handheld computing devices. Two main contributions in this research are the novel use
of spectrogram analysis and neural network programming for EM analysis attacks. There
are no previous uses of spectrogram analysis and neural network programming for EM
analysis attacks. As well, there is no previous experimental work on EM analysis against
ECC computations on PDA implementations.
EM analysis is particularly devastating for handheld devices as they are more
likely to be exposed to adversaries and EM signals may be easily captured due to their
device characteristics. In particular, the analysis focuses on the EM analysis on the scalar
multiplication algorithm, which is the dominant operation in an ECC cryptographic
system.
DEMA targets the ECC point double operation as it is performed at every
iteration regardless of the scalar value. DEMA recovers the secret key by statistical
analysis of the EM traces over many runs of the scalar multiplication operation. It is
discovered that the most optimal way of trace splitting is by partitioning on the MSB of a
point coordinate for prime fields. This is because the MSB value has the greatest
correlation with whether an operation on that value would result a carry-out, which would
trigger a series of different computations producing differential signals in EM side
channel. The use of other partition bits yields significantly lower differential signal.
A new quantitative way to judge whether a differential signal is significant was
proposed, which measures the percentile of peaks at different multiples of SD_DOM.
94
A new type of DEMA that analyzes the EM signals using power spectrum density
and spectrogram was proposed. Spectrogram signals are found to be better for
differential analysis. However power spectrum density signals are not effective for
differential analysis as any differential signal at a small time interval would get smudged
out over the entire interval of the frequency analysis. However, spectrogram does not
suffer from this problem because it is a type of time-dependant frequency analysis. As
well, spectrogram is superior than time domain analysis because it is less vulnerable to
jitters in the original EM signals.
Finally the previous proposed attack methodology can be extrapolated to different
attack scenarios where the timings of point operations are known, the timings are not
known, or when a window multiplication method is used.
A new innovative technique of SEMA using an artificial intelligence
programming paradigm known as neural networks is proposed to distinguish between
point addition and double operations. The neural network must be presented with
training data so that it can be trained to recognize the point operations. In classification,
the neural network returns a value between +1 and -1. If the result is positive, the
operation is classified as point doubling; otherwise it is classified as point doubling. All
training and testing data must be preprocessed to reduce the dimensionality of the input
data. The parameters for the preprocessing and neural network units must be found by
experiments and use of heuristics.
As for reference, the accuracy of the new technique is compared against a
technique based the template attacks using optimal signal detection theory. There are
many similarities between the neural network and template attack strategies; one can
consider the weights and parameters of a neural network as a template. Experimentally a
SEMA based on signal detection theory is found to be much less effective as the
underlying algorithm assumes a received signal is a linear combination of Gaussian noise
and the underlying signal. In reality, the noise in the ambient environment is not
Gaussian. More importantly, the underlying EM signal is being distorted non-linearly by
events that occur on the run-time environment of the computing device. It is found that
the use of spectrogram signal for classification can slightly improve the classification
accuracy with the neural network system.
95
There are other techniques that can further improve the classification accuracy.
Taking an average of classification results over many executions can drastically improve
the accuracy. In practice, the attacker is expected to be able to obtain EM signals over
multiple executions for SEMA. Furthermore, the use of an ILP model is effective for
error correction which leads to greater classification accuracy.
10.3 Countermeasures
There are three common approaches to resist simple analysis (i.e. SEMA):
indistinguishable formulas for point operations [LS01] [JQ01], identical operation
sequence independent of key bits [Cor99] [Mo87], and random addition-subtraction chain
[OA01]. These approaches are described in chapter 4.
The use of indistinguishable formulas appears to be the most superior. The use of
identical operation sequence has considerable overhead because it performs point
addition and doubling at every iteration regardless of the scalar bit values. The random
addition-subtraction chain is vulnerable in light of the new accurate classification
algorithm based on neural network. With an accurate classification system and the
hidden Markov model [KW03], the scalar bits may be recovered.
Scalar multiplication algorithms that are secured against simple analysis may be
vulnerable to differential analysis (i.e. DEMA). There are two common approaches to
resisting differential analysis: randomizing the base point P and randomizing the scalar k
in the scalar multiplication. Both of these approaches can effectively counter differential
analysis.
One shortcoming of these conventional countermeasures is that they all use
randomization. However, implementation of a true randomizer is very difficult and
costly. Mostly implementations use pseudo-random generators and derive initial values
from non-random sources such as time of day or mouse movements. These pseudo-
random generators are prone to attacks. A superior type of countermeasures is one that
does not require the use of any random values.
96
In the differential analysis experiments, the MSB of the input point is always
chosen as the partition bit. When the MSB is one, there is a much greater chance than a
carry-out would occur in the finite field calculations with the input point. In fact, this bit
is the only partition bit that works for differential analysis. A possible countermeasure is
to use the same algorithm for finite field computations regardless of whether a carry-out
occurs.
10.4 Future Work
Currently all point operations activate the trigger signal to indicate the timing of point
operations to the digital oscilloscope. Future research work should be devoted to
performing SEMA and DEMA without this trigger signal, as an adversary would not
have a trigger signal available in a complete attack. This may be a particularly
challenging research problem for SEMA as it requires precise timing information of the
point operations. This is perhaps similar to the segmentation problem in speech analysis
where the system needs to identify the timings of the words spoken in a conversation.
Further research can be devoted to apply the techniques in this thesis to more
practical ECC scalar multiplication algorithms such as the window scalar multiplication.
It may also be worthwhile to attempt an attack on an algorithm that is resistant to SEMA
and DEMA to see how well they resist to the new attack methodologies proposed here.
There is much similarly between template attacks and neural network
classification system. It should be possible to modify the neural network system to
perform DEMA, as is done with template attacks. The pruning techniques in template
attacks can be applied directly with the neural network system.
There are a lot of details of the neural network that can be further investigated. A
typical back-propagation network is used in this experiment. However other networks
such as radix network are also possible and are used in other recognition systems [MW].
As well, the preprocessing that is used employs a very common strategy for minimize
data redundancy. Perhaps a better technique tailored toward EM signals can be
developed for preprocessing.
97
As well, classification systems for speech and images use many different
techniques beside neural networks. Some speech analysis systems use techniques based
of HMM (Hidden Markov Model). This technique may prove to be also useful for EM
analysis attacks.
The techniques developed here can be applied to other side channel sources, such
as power. It would be of interest to see how effective these techniques are on other side
channel sources.
98
Appendix A –Java API for 192-bit Prime Field (pf192)
public pf192(long[] seg) Construct a pf192 object with a 192-bit element represented as an array of six long integers public pf192(String bitString) Construct a pf192 object with a 192-bit element represented as hexadecimal string public pf192 add(pf192 a) Return the sum of this pf192 element and the given element public static pf192 add(pf192 a, pf192) Return the sum of the pf192 elements public pf192 sub(pf192 a) Return the result of this pf192 element minus the given element public static pf192 sub(pf192 a, pf192 b) Return the result of pf192 element a minus the element b public pf192 mul(pf192 a) Return the product of this pf192 element and the given element public static pf192 mul(pf192 a, pf192 b) Return the product of pf192 element a and element b public pf192 square() Return the square of this pf192 element public pf192 inv() Return the inverse of this pf192 element public pf192 lshift(int shift) Left-shift this pf192 element by a given number of bits. This is used for scalar multiplication. public byte compareTo(pf192 element) Compare this pf192 element with a given element. Return 0 if they are equal, -1 if they are additive inverses, and +1 otherwise public boolean equalTo(pf192 element) Return true if this pf192 element equals to the given element; false otherwise
99
Appendix B – Java API for 192-bit ECC (ECC_P192)
public ECC_P192() Construct a point at infinity public ECC_P192(pf192 x, pf192 y) Construct an affine point with x and y coordinates public ECC_P192(pf192 x, pf192 y, pf192 z) Construct a Jacobian point with x, y and z coordinates public ECC_P192(pf192 x, pf192 y, pf192 z, pf192 z2, pf192 z3) Construct a Chudnovsky Jacobian point with x, y, z, z2 and z3 coordinates public ECC_P192(ECC_P192 pt) Copy constructor public ECC_192 dblAC() Return the double of this affine point as a Chud-Jacobian point public ECC_192 dblAJ() Return the double of this affine point as a Jacobian point public ECC_192 dblJJ() Return the double of this Jacobian point as a Jacobian point public ECC_192 addCCC(ECC_192 pt) Return the sum of this point and a given point. All are in Chud-Jacobian coordinates public ECC_192 addJAJ(ECC_192 pt) Return the sum of this Jacobian point and a given affine point, as a Jacobian point public ECC_192 addJCJ(ECC_192 pt) Return the sum of this Jacobian point and a given Chud-Jacobian point, as a Jacobian point public void toAffine() Convert this point to affine coordinate public String toString() Return the hexadecimal values of the point coordinates. public int compareTo(ECC_P192 pt) Compare this point to a given point. Return 0 if they are equaled, -1 if they are additive inverses, +1 otherwise
100
public ECC_P192 binary(int[] scalar) Return the scalar multiple of this element by the given scalar using the multiplication algorithm based binary expansion of scalar bits public ECC_P192 slidewin(int[] scalar) Return the scalar multiple of this element by the given scalar using the sliding window algorithm
101
Appendix C – GAMS Model for Integer Optimization SETS T time slots /1*331/; ALIAS (T,T1),(T,T2); PARAMETERS P(T) probability value /1 0.318248 2 0.988648 …/ C(T) check value /1 1 2 1 …/; SCALAR D number of double ops /191/; VARIABLES X(T) assigns this time slot for dbl op Z maximize total probability ERR bit error rate; BINARY VARIABLE X; EQUATIONS PROB define objective function NUMDBL restrict the number of dbl op NO2ADJ(T1,T2) no two adjacent add ops allowed BERR calculate error rate; PROB .. Z =E= SUM(T, P(T)*X(T)); NUMDBL .. SUM(T, X(T)) =E= D; NO2ADJ(T1,T2)$(ord(T1) eq (ord(T2)-1)) .. X(T1) + X(T2) =G= 1; BERR .. ERR =E= (card(T)-SUM(T, (2*X(T)-1)*C(T)))/2; OPTION SUBSYSTEMS; model ready /all/; solve ready using MIP maximizing Z;
102
Appendix D – Matlab Code for Neural Networks % AI SPA attack % General Parameters record_len = param(1); period = param(2)*1000; %ms sample_rate = 1/period; %kHz record_count = param(6); % Pre-processing Parameters win_size = 2500; step_size = 1500; env_size = 50; % Load training and testing data load('../mult/init0.mat', '-mat'); feature0 = SpecPreProcess(em_dat, win_size, step_size, env_size); clear em_dat; load('../mult/init1.mat', '-mat'); feature1 = SpecPreProcess(em_dat, win_size, step_size, env_size); clear em_dat; load('../mult/init2.mat', '-mat'); feature2 = SpecPreProcess(em_dat, win_size, step_size, env_size); clear em_dat; % Format training data train_result = repmat([ones(1,150) -1*ones(1,150)], 1, 3); train_feature1 = [feature0(:,1:300) feature1(:,1:300) feature2(:,1:300)]; train_feature = [train_feature1 train_feature1 feature0(:,1:300)]; % Preprocess training data [pn,meanp,stdp] = prestd(train_feature); [ptrans,transMat] = prepca(pn,0.004); [i, trainSize] = size(train_feature1); ptrans = ptrans(:,1:trainSize); train_result = train_result(:,1:trainSize); % Format testing data test_feature = [feature0(:,301:631) feature1(:,301:631) feature2(:,301:631)]; test_result = [1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1; 1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1; 1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1; 1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;
103
1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;1;1;-1;1;-1; 1;1;1;-1;1;-1;1;-1;1;1;-1;1;-1;1;-1;1;-1;1;1;-1;1;-1;1;-1;1;-1;1;-1;1;1;1;1;1;1;-1;1;-1;1;1;-1; 1;-1;1;1;1;1;1;-1;1;1;-1;1;1;1;1;-1;1;-1;1;1;-1;1;1;-1;1;-1;1;-1;1;-1;1;1;1;-1;1;1;1;-1;1;-1; 1;1;-1;1;-1;1;1;1;1;-1;1;-1;1;1;-1;1;-1;1;1;-1;1;1;1;-1;1;-1;1;1;-1;1;1;1;-1;1;1;1;1;-1;1;1; -1;1;1;1;1;1;1;-1;1;-1;1;1;1;1]; % Preprocessing training data [p2n] = trastd(test_feature,meanp,stdp); [p2trans] = trapca(p2n,transMat); [i testSize] = size(test_feature); [dlen, i] = size(ptrans); % Testing tsize = 10; % test 10 times miss = 0; elapse = 0; for tstep = 1:tsize % Create neural network
net1 = newff(minmax(p2trans),[3*dlen 1*dlen 1*dlen 1],{'tansig' 'tansig' 'tansig' 'tansig'},'trainrp');
net1.trainParam.epochs = 1000; net1.trainParam.goal = 1e-10; net1.trainParam.min_grad = 0; % Simulate neural network stime = cputime; [net1,tr]=train(net1,ptrans,train_result); elapse = elapse + cputime - stime; op = sim(net1,p2trans); % Compare test results for j=1:testSize if (sign(op(j)) ~= sign(test_result(mod(j-1,331)+1))) miss = miss + 1; end %score = score + op(j) * test_result(j); end end % Display results s = sprintf('Number of misses: %d %d out of %d',miss, miss/tsize/testSize,331); disp(s); s = sprintf('Time elapsed: %d ',elapse/tsize); disp(s);
104
Bibliography
[AAR02] D. Agrawal, B. Archambeault, J.R. Rao, and P. Rohatgi, “EM side–
channel(s): attacks and assessment methodologies,” April 2005; http://www.research.ibm.com/intsec/emf-paper.ps.
[ANC] GuruNet, “Answers.com”, April 2005,
http://www.answers.com/topic/gradient-descent. [ANSI] ANSI X9.62, Public Key Cryptography for the Financial Services Industry:
The Elliptic Curve Digital Signature Algorithm (ECDSA), 1999. [B95] C. Bishop, Neural Networks for Pattern Recognition, Oxford: University
Press, 1995. [BDL] D. Boneh, R.A. Demillo, R.J. Lipton, “On the importance of checking
computations,” April 19, 2005, http://jya.com/smart.pdf.
[BHL00] M. Brown, D. Hankerson, J. Lopez, A Menezes, “Software implementation of NIST elliptic curves over binary fields,” CHES 2000, LNCS 1965, Springer-Verlag, 2000, 1 ff.
[BJ02] E. Brier and M. Joye, “Weierstra elliptic curves and side-channel attacks,"
PKC 2002, LNCS 2274, Springer-Verlag, 2002, pp. 335-345. [BS96] E. Biham, A. Shamir, “Research announcement: A new cryptanalytic attack
on DES,” October 18, 1996, http://jya.com/dfa.htm. [C99] J. Coron, “Resistance against differential power analysis for elliptic curve
cryptosystems,” CHES 1999, LNCS 1717, Springer-Verlag, 1999, pp. 292-302.
[CC87] D. Chudnovsky and G. Chudnovsky, “Sequences of numbers generated by
addition in formal groups and new primality and factoring tests”, Advances in Applied Mathematics, 1987, pp. 385-434.
[CRR02] S. Chari, J.R. Rao, P. Rohatgi, “Template Attacks,” CHES 2002, LNCS 2523,
Springer-Verlag, 2002, pp.172-186. [EM6990] Electro-Metrics Inc., Instruction Manual: Broadband Amplifier Model EM-
6990, 2004. [EM6992] Electro-Metrics Inc., Instruction Manual: Near Field Probe Set Broadband
Response Model EM-6992, 2004.
105
[GMO01] K. Gandolfi, C. Mourtel and F. Olivier, “Electromagnetic attacks: concrete
results,” CHES 2001, LNCS 2162, Springer-Verlag, 2001, pp 251–261. [S96] L Smith, “An introduction to neural networks”, October 1996,
http://www.cs.stir.ac.uk/~lss/NNIntro/InvSlides.html. [JQ01] M. Joye and J. Quisquater, “Hessian elliptic curves and side-channel attacks”,
CHES 2001, LNCS 2162, Sprinter-Verlag, 2001, pp. 402-410. [K96] P. Kocher, “Timing attacks on implementations of Diffie-Hellman, RSA, DSS
and other Systems,” CRYPTO '96, 1996, pp. 104-113. [IT02a] T. Izu and T. Takagi, “A fast parallel elliptic curve multiplication resistant
against side channel attacks," Technical Report CORR 2002-03, University of Waterloo, 2002, http://www.cacr.math.uwaterloo.ca/.
[IT02b] T. Izu and T. Takagi, “On the security of Brier-Joye's addition formula for
Weierstrass-form elliptic curves," Technical Report No. TI-3/02, Technische University Darmstadt, 2002. http://www.informatik.tu-darmstadt.de/TI/.
[KJJ99] P. Kocher, J. Jaffe, and B. Jun, “Differential power analysis,” CRYPTO ’99,
Springer-Verlag, 1999, pp. 388-397. [Koblitz] N. Koblitz, “Elliptic cure cryptosystems”, Mathematics of Computation, 1987,
pp. 203-209 [KW03] C. Karlof, D. Wagner, “Hidden Markov Model Cryptanalysis,” CHES 2003 [LS01] P. Liardet and N. Smart, “Preventing SPA/DPA in ECC systems using the
Jacobi form”, CHES 2001, LNCS 2162, Springer-Verlag 2001, pp. 391-401. [M87] P. Montgomery, “Speeding the Pollard and Elliptic Curve Methods for
Factorizations”, Mathematics of Computation, vol. 48, 1987, pp. 243-264. [Miller] V. Miller, “Uses of elliptic curves in cryptography”, Crypto ’85, Lecture
Notes in Computer Science, 218 (1986), Springer-Verlag, pp. 417-426. [Murray] K. D. Murray, “The great seal bug story”, Murray Associates, 2002. [MW] MathWorks, “Online MATLAB documentation, ”April 2005,
http://www.mathworks.com/access/helpdesk/help/helpdesk.html. [NIST] National Institute of Standards and Technology, Recommended Elliptic
Curves for Federal Government Use, Appendix to FIPS 186-2, 2000.
106
[O02] E. Oswald, “Enhancing simple power-analysis attacks on elliptic curve cryptosystems”, CHES 2002 LNCS 2523, Springer-Verlag, 2002, pp. 82 ff.
[OA01] E. Oswald, M. Aigner, “Randomized addition-subtraction chains as a
countermeasure against power attacks”, CHES 2001, LNCS 2162, Springer-Verlag, 2001, pp. 39 ff.
[OS00] K. Okeya and K. Sakurai, “Power analysis breaks elliptic curve cryptosystems
even secure against the timing attack", INDOCRYPT 2000, LNCS 1977, Springer-Verlag, 2000, pp. 178-190.
[QS01] J.J. Quisquater and D. Samyde. “ElectroMagnetic analysis (EMA): measures
and counter-measures for smart cards,” In Smart Card Programming and Security (E-smart 2001), LNCS 2140, pp. 200–210.
[R96] B.D. Ripley, Pattern Recognition and Neural Networks, Cambridge
University Press, 1996. [SEC2] Standards for Efficient Cryptography Group/Certicom Research, SEC 2:
Recommended Elliptic Curve Cryptography Domain Parameters, Version 1.0, 2000.
[Tek] Tektronix Inc., User Manual: Digital Phosphor Oscilloscopes TDS7254,
2003. [WW4] J. Waddle and D. Wagner, “Towards efficient second-order power analysis,”
CHES 2004, Springer-Verlag 2004, pp. 1 – 15. [YR01] Yeung and Ruzzo, “Principal component analysis for clustering gene
expression data,” Bioinformatics 17(9): 763-74., 2001