EM Analysis of ECC Computations on Mobile Devicescgebotys/NEW/Simon_thesis.pdf · Many secured...

ii

EM Analysis of ECC Computations

on Mobile Devices

by

Simon C. K. Ho

A thesis

presented to the University of Waterloo

in fulfillment of the

thesis requirement for the degree of

Master of Applied Science

In

Electrical and Computer Engineering

Waterloo, Ontario, Canada, 2005

© Simon C. K. Ho 2005

ii

I hereby declare that I am the sole author of this thesis.

I authorize the University of Waterloo to lend this thesis to other institutions or individuals for the

purpose of scholarly research.

I further authorize the University of Waterloo to reproduce this thesis by photocopying or by

other means, in total or in part, at the request of other institutions or individuals for the purpose of

scholarly research.

iii

The University of Waterloo requires the signatures of all persons using or photocopying

this thesis. Please sign below, and state an address and date.

iv

Abstract

Internet-enabled mobile devices, such as PDAs and mobile phones, open the door to a

slew of new commercial applications and services. However these devices also impose a

unique set of security challenges due to their mobility. In particular, they may be

vulnerable to a type of side channel attack known as EM analysis, which analyzes the

correlation between the leaked EM emanations and the secrets in the cryptographic

computations. Understanding the threats of EM analysis is vital to evaluating the security

of these PDA devices.

Many secured applications use public key cryptography for providing

authentication, integrity, non-repudiation and encryption. ECC (Elliptic Curve

Cryptography) is a particularly suitable public-key system for mobile devices as it is

more efficient than other common cryptographic systems.

This thesis explores the vulnerabilities of ECC computations on mobile devices to

EM analysis: SEMA (Simple EM Analysis) and DEMA (Differential EM Analysis).

New analysis techniques and attack methodologies pertinent to ECC and PDA are

proposed. It is found that the use of AI neural network programming and integer

optimization models can improve the power of SEMA. In DEMA the choice of partition

bit and reference signal can increase its lethalness.

Furthermore, the use of power spectrum density and spectrogram for EM analysis

are proposed and examined. It is found the spectrogram signals are more effective than

time domain and power spectrum density signals for both SEMA and DEMA.

v

Acknowledgements

I would like to thank my supervisor, Professor Cathy Gebotys, for all her advice,

guidance and encouragement. I would also like to thank my parents and friends for their

support.

I greatly appreciate the generous financial support provided by Professor Gebotys

through a Research Assistantship. I am also grateful for the scholarships awarded to me

by the Department of Electrical and Computer Engineering at the University of Waterloo.

vi

Tables of Contents ABSTRACT ................................................................................................................................................ IV LIST OF TABLES...................................................................................................................................VIII LIST OF FIGURES.................................................................................................................................... IX LIST OF ALGORITHMS ......................................................................................................................... XI 1 INTRODUCTION............................................................................................................................... 1

1.1 RESEARCH MOTIVATION.............................................................................................................. 2 1.2 THESIS OBJECTIVE ....................................................................................................................... 3 1.3 THESIS OVERVIEW ....................................................................................................................... 3

2 INTRODUCTION TO EM SIGNAL CAPTURE AND ANALYSIS ............................................. 5 2.1 ORIGIN AND TYPES OF EM SIGNALS ............................................................................................ 5 2.2 CAPTURE OF EM SIGNALS............................................................................................................ 7 2.3 DIRECT EMANATION .................................................................................................................... 8 2.4 UNINTENDED EMANATION ........................................................................................................... 8 2.5 BENEFITS OF EM ANALYSIS OF PDA ......................................................................................... 10 2.6 SPECTROGRAM........................................................................................................................... 10

3 INTRODUCTION TO SIDE CHANNEL ATTACKS................................................................... 12 3.1 TIMING ANALYSIS...................................................................................................................... 12 3.2 FAULT ANALYSIS ....................................................................................................................... 13 3.3 SIMPLE ANALYSIS ON POWER/EM SIGNALS............................................................................... 14 3.4 DIFFERENTIAL ANALYSIS ON POWER/EM SIGNALS ................................................................... 14 3.5 TEMPLATE ATTACK ON POWER/EM SIGNALS ............................................................................ 16

4 INTRODUCTION TO ELLIPTIC CURVE CRYPTOGRAPHY................................................ 18 4.1 MATHEMATICAL OVERVIEW ...................................................................................................... 18 4.2 BENEFITS FOR PDA IMPLEMENTATION ...................................................................................... 19 4.3 IMPLEMENTATION ...................................................................................................................... 20 4.4 COUNTERMEASURES TO THWART SIDE CHANNEL ATTACKS...................................................... 21

5 PROPOSED METHODOLOGY OF DEMA ................................................................................. 25 5.1 PROPOSED TRACE SPLITTING STRATEGY ................................................................................... 26 5.2 PROPOSED DIFFERENTIAL ANALYSIS OF TRACES ....................................................................... 30 5.3 PROPOSED DIFFERENTIAL ANALYSIS IN FREQUENCY AND SPECTROGRAM ................................ 34 5.4 ATTACK STRATEGY ON KNOWN POINT OPERATION................................................................... 36 5.5 PROPOSED ATTACK STRATEGY ON UNKNOWN POINT OPERATION............................................. 36 5.6 PROPOSED ATTACK STRATEGY ON WINDOW METHOD .............................................................. 37

6 PROPOSED METHODOLOGY OF SEMA.................................................................................. 39 6.1 MOTIVATION OF USING NEURAL NETWORK ............................................................................... 39 6.2 NEURAL NETWORK STRUCTURE ................................................................................................ 42 6.3 PREPROCESSING ......................................................................................................................... 44 6.4 TRAINING................................................................................................................................... 47 6.5 CLASSIFICATION ........................................................................................................................ 48 6.6 COMBINATION OF CLASSIFICATION RESULTS............................................................................. 48 6.7 INTEGER OPTIMIZATION MODEL ................................................................................................ 50

7 EXPERIMENTAL SETUP AND METHODOLOGY................................................................... 52 7.1 TARGET HARDWARE PLATFORM................................................................................................ 52 7.2 TARGET SOFTWARE PLATFORM ................................................................................................. 52

vii

7.3 ECC PROGRAM IMPLEMENTATION............................................................................................. 53 7.4 MEASUREMENT SETUP AND TECHNIQUE.................................................................................... 56 7.5 OSCILLOSCOPE CONFIGURATION ............................................................................................... 59

8 EXPERIMENTAL RESULTS OF DEMA ..................................................................................... 61 8.1 SETUP......................................................................................................................................... 61 8.2 RESULTS OF TRACE SPLITTING................................................................................................... 62 8.3 RESULTS OF TIME DOMAIN ANALYSIS ....................................................................................... 65 8.4 RESULTS OF POWER SPECTRUM DENSITY ANALYSIS ................................................................. 66 8.5 RESULTS OF SPECTROGRAM ANALYSIS...................................................................................... 69 8.6 COMPARISONS............................................................................................................................ 71

9 EXPERIMENTAL RESULTS OF SEMA...................................................................................... 73 9.1 SETUP......................................................................................................................................... 73 9.2 PARAMETERS OF NEURAL NETWORK ......................................................................................... 76 9.3 RESULTS OF NEURAL NETWORK USING TIME DOMAIN SIGNALS ................................................ 77 9.4 RESULTS OF NEURAL NETWORK USING SPECTROGRAM SIGNALS .............................................. 81 9.5 RESULTS OF TEMPLATE ATTACK................................................................................................ 86 9.6 RESULTS OF AVERAGING AND INTEGER OPTIMIZATION MODEL ................................................ 89 9.7 COMPARISONS............................................................................................................................ 91

10 DISCUSSION AND CONCLUSIONS ............................................................................................ 92 10.1 LIMITATION OF RESEARCH AND IMPLEMENTATION.................................................................... 92 10.2 SUMMARY.................................................................................................................................. 93 10.3 COUNTERMEASURES .................................................................................................................. 95 10.4 FUTURE WORK........................................................................................................................... 96

APPENDIX A –JAVA API FOR 192-BIT PRIME FIELD (PF192)...................................................... 98 APPENDIX B – JAVA API FOR 192-BIT ECC (ECC_P192) ............................................................... 99 APPENDIX C – GAMS MODEL FOR INTEGER OPTIMIZATION ............................................... 101 APPENDIX D – MATLAB CODE FOR NEURAL NETWORKS ...................................................... 102 BIBLIOGRAPHY .................................................................................................................................... 104

viii

List of Tables

TABLE 6-1: ERROR CORRECTION WITH INTEGER OPTIMIZATION MODEL ............................... 51 TABLE 7-1: ELLIPTIC CURVE P-192 PARAMETERS [NIST]............................................................... 55 TABLE 7-2: COSTS OF POINT OPERATIONS [BHL00] ........................................................................ 56 TABLE 8-1: GREATEST MULTIPLES OF SD_DOM BELOW PEAK AMPLITUDE............................ 71 TABLE 9-1: CPU TIME AND ERROR % FOR DIFFERENT NN TRAINING ALGORITHMS ............. 81 TABLE 9-2: OPTIMAL PARAMETER VALUES OF A NN USING TIME DOMAIN SIGNALS.......... 81 TABLE 9-3: CPU TIME AND ERROR % FOR DIFFERENT NN TRAINING ALGORITHMS ............. 86 TABLE 9-4: OPTIMAL PARAMETER VALUES OF A NN USING SPECTROGRAM SIGNALS........ 86 TABLE 9-5: % ERROR RATE OF DIFFERENT ALGORITHMS ............................................................ 89

ix

List of Figures

FIGURE 2-1: CMOS GATE .......................................................................................................................... 5 FIGURE 2-2: 3D DIAGRAM OF EM EMANATION [QS01]...................................................................... 6 FIGURE 2-3. NEAR FIELD PROBE [GMO01]............................................................................................ 8 FIGURE 2-4. SPECTROGRAM.................................................................................................................. 11 FIGURE 5-1: MSB PARTITIONING.......................................................................................................... 29 FIGURE 5-2: CONSTANT REFERENCE SIGNAL................................................................................... 30 FIGURE 5-3: SD-DOM REFERENCE SIGNAL ........................................................................................ 32 FIGURE 5-4: IDEAL DIFFERENTIAL SIGNAL....................................................................................... 33 FIGURE 5-5: ACTUAL DIFFERENTIAL SIGNAL .................................................................................. 33 FIGURE 5-6: RELATIONSHIP BETWEEN DIFFERENT SIGNALS....................................................... 35 FIGURE 5-7: ATTACK STRATEGY ON UNKNOWN POINT OPERATION ........................................ 37 FIGURE 5-8: DIFFERENTIAL SIGNAL FOR DIFFERENT KEY BITS.................................................. 38 FIGURE 6-1: SIGNAL COMPONENT OF OPERATION A TEMPLATE................................................ 40 FIGURE 6-2: SIGNAL COMPONENT OF OPERATION B TEMPLATE................................................ 40 FIGURE 6-3: NEURAL NETWORK STRUCTURE [S96] ........................................................................ 42 FIGURE 6-4: TAN-SIGMOID TRANSFER FUNCTION [MK] ................................................................ 43 FIGURE 6-5: SIGNAL ENVELOPE ........................................................................................................... 44 FIGURE 6-6: GRADIENT DECENT ALGORITHM [ANC] ..................................................................... 47 FIGURE 6-7: EFFECT OF COMBINATION OF CLASSIFICATION RESULTS .................................... 50 FIGURE 7-1: JAVA RUNTIME ENVIRONMENT.................................................................................... 53 FIGURE 7-2: MEASUREMENT SETUP.................................................................................................... 57 FIGURE 7-3: EM PROBE ........................................................................................................................... 58 FIGURE 8-1: DIFFERENTIAL SIGNAL FOR CORRECT BIT PARTITIONING ON 2ND MSB ............ 63 FIGURE 8-2: DIFFERENTIAL EM SIGNAL FOR CORRECT PARTITIONING ON 3RD MSB............. 63 FIGURE 8-3: DIFFERENTIAL EM SPECTRO FOR CORRECT PARTITIONING ON 2ND MSB.......... 64 FIGURE 8-4: DIFFERENTIAL EM SPECTRO FOR CORRECT PARTITIONING ON 3RD MSB .......... 64 FIGURE 8-5: DIFFERENTIAL EM SIGNAL OF ECC DOUBLE WITH CORRECT GUESS................. 65 FIGURE 8-6: DIFFERENTIAL EM SIGNAL OF ECC DOUBLE WITH INCORRECT GUESS ............ 66 FIGURE 8-7: DIFFERENTIAL EM PSD OF ECC DOUBLE WITH CORRECT GUESS........................ 67 FIGURE 8-8: DIFFERENTIAL EM PSD OF ECC DOUBLE WITH INCORRECT GUESS.................... 67 FIGURE 8-9: DIFFERENTIAL PSD OF ECC DOUBLE WITH CORRECT GUESS............................... 68 FIGURE 8-10: DIFFERENTIAL PSD OF ECC DOUBLE WITH INCORRECT GUESS......................... 68 FIGURE 8-11: DIFFERENTIAL EM SPECTRO OF ECC DOUBLE WITH CORRECT GUESS............ 69

x

FIGURE 8-12: DIFF EM SPECTRO OF ECC DOUBLE WITH INCORRECT GUESS........................... 70 FIGURE 8-13: A FRAME OF DIFFERENTIAL EM SPECTRO WITH CORRECT GUESS................... 70 FIGURE 8-14: A FRAME OF DIFFERENTIAL EM SPECTRO WITH INCORRECT GUESS............... 71 FIGURE 9-1: EM SIGNAL FROM ECC DOUBLE OPERATION #1 ....................................................... 74 FIGURE 9-2: EM SIGNAL FROM ECC DOUBLE OPERATION #2 ....................................................... 75 FIGURE 9-3: EM SIGNAL FROM ECC ADDITION OPERATION #1.................................................... 75 FIGURE 9-4: EM SIGNAL FROM ECC ADDITION OPERATION #2.................................................... 76 FIGURE 9-5: PLOT OF ACCURACY VS. ENVELOPE SIZE .................................................................. 77 FIGURE 9-6: PLOT OF ACCURACY VS. MIN FRACTION ................................................................... 78 FIGURE 9-7: PLOT OF ACCURACY VS. INPUT LAYER SIZE............................................................. 79 FIGURE 9-8: PLOT OF ACCURACY VS. HIDDEN LAYER SIZE ......................................................... 80 FIGURE 9-9: PLOT OF ACCURACY VS. WINDOW SIZE ..................................................................... 82 FIGURE 9-10: PLOT OF ACCURACY VS. OVERLAP SIZE .................................................................. 83 FIGURE 9-11: PLOT OF ACCURACY VS. ENVELOPE SIZE ................................................................ 83 FIGURE 9-12: PLOT OF ACCURACY VS. MIN. FRACTION................................................................. 84 FIGURE 9-13: PLOT OF ACCURACY VS. INPUT LAYER SIZE........................................................... 85 FIGURE 9-14: PLOT OF ACCURACY VS. HIDDEN LAYER SIZE ....................................................... 85 FIGURE 9-15: PLOT OF ACCURACY VS. OBSERVATION SIZE......................................................... 88 FIGURE 9-16: PLOT OF MEMORY VS. OBSERVATION SIZE............................................................. 89 FIGURE 9-17: PLOT OF ACCURACY VS. NUMBER OF EXECUTIONS ............................................. 90 FIGURE 9-18: PLOT OF ACCURACY VS. NUMBER OF EXECUTIONS ............................................. 91

xi

List of Algorithms

ALGORITHM 4-1: DOUBLE-AND-ADD SCALAR MULTIPLICATION............................................... 20 ALGORITHM 4-2: ADD-SUBTRACT SCALAR MULTIPLICATION.................................................... 21 ALGORITHM 4-3: DOUBLE-AND-ADD-ALWAYS SCALAR MULTIPLICATION ............................ 22 ALGORITHM 5-1: MODULAR ADDITION ............................................................................................. 27 ALGORITHM 5-2: MODULAR SUBTRACTION..................................................................................... 27 ALGORITHM 5-3: STANDARD DEVIATION OF DIFFERENCE OF MEANS ..................................... 31 ALGORITHM 5-4: RATIO BETWEEN DIFFERENTIAL AND REFERENCE SIGNAL........................ 34 ALGORITHM 5-5: SPECTROGRAM ........................................................................................................ 36 ALGORITHM 9-1: OPTIMIZATION OF THE NUMBER OF NEURONS............................................... 78

1

1 Introduction

Mobile Commerce, or m-commerce, encompasses a variety of commercial services and

products that are accessible from Internet-enabled mobile devices, such as PDAs and

mobile phones. Their mobility opens the door to a slew of new applications and services.

They follow us wherever we go, making it possible to shop online while riding on a

subway train or finding a nearby restaurant while walking down the street. However

these devices impose a unique set of constrains and security challenges.

The security of m-commerce relies on the underlying public key cryptographic

functions to provide authentication, integrity, non-repudiation and encryption.

Traditional cryptanalysis techniques view a cryptography system as a black-box, and

exploit weaknesses purely at the algorithm and protocol levels. However, it is far more

powerful to exploit weaknesses in the implementation level, particularly information

inadvertently leaked to other information channels known as side channels.

Most side channel attacks require the attackers to physically access and tamper

with the mobile devices. A careful user can prevent these attacks by securing the device

from thief, or at least minimize the damage once he discovers that it is lost.

A much more devastating attack would be one that requires no physical access

and can be performed without user’s knowledge, say, when an unsuspecting user is

shopping online during a subway ride. This is possible if the attack analyzes information

leaked from electromagnetic (EM) emanations from these mobile devices. EM radiations

from mobile devices may be captured from a several feet away [AAR02]. Furthermore,

the attack would even more devastating if the attacker can perform the analysis without

knowing the plaintext and ciphertext in the cryptographic operations. This thesis

explores the feasibility of performing EM analysis on a type of public key cryptographic

system, Elliptic Curve Cryptographic (ECC) system, running on mobile devices.

2

1.1 Research Motivation

Conventional cryptanalysis techniques tend to need a huge amount of computational

resources relative to side channel attacks. Therefore side channel attacks are more

threatening than conventional cryptanalysis in practice.

In the past few years, much research attention has been afforded to the application

of side channel attacks on smart card devices. Unfortunately very little research is done

to investigate side channel attacks on mobile devices. Attacks on mobile devices have

become an important issue as these devices become more pervasive and m-commerce on

these devices becomes more prominent. PDAs are particularly suitable for m-commerce

applications, and they are chosen to be the hardware platforms for this research.

Mobile devices are very vulnerable to EM analysis as their mobility makes their

EM emanation accessible by an attacker. Further most mobile devices have limited EM

shielding and are more susceptible to thefts. However there is little research done in EM

analysis of these mobile devices.

The most crucial proponents of secured m-commerce are entity authentication, data

integrity and non-repudiation, which are provided by public key protocols such as digital

signature. However, public key protocols tend to be very computationally expensive, and

PDAs have very limited computational power and memory capacity. Therefore, the

research focus is on Elliptic Curve Cryptography (ECC) which is a very efficient public

key algorithm and is most suitable for mobile devices. ECC is also widely accepted by

research communities and industries. Hence ECC computation is the target algorithm

investigated in this research.

Hopefully, by understanding the treat of EM analysis attacks on ECC computations

of PDA devices, we can better secure these systems from adversaries.

3

1.2 Thesis Objective

The main purpose of the thesis is to present new research findings in the EM analysis of

ECC computations on PDA devices. To this end there are four objectives of this thesis.

The first objective is to investigate the techniques used to capture the EM emanations

from a PDA, which encompass locating the best source of EM emanation as well as the

most appropriate equipments and configurations used.

Once the EM emanations are captured, they are processed and analyzed to extract

useful information. The second objective is to investigate and compare the different

analysis techniques such as time domain analysis, power spectral density (PSD) and

spectrogram.

The third objective is to present new and more effective methodologies of

differential EM analysis (DEMA) on ECC computations that are particularly suitable for

PDA devices. This includes discussion on how to partition the EM signals for differential

analysis, algorithms for DEMA, and general strategies for different ECC computation

algorithms.

The final objective is to investigate the novel application of artificial intelligence

programming paradigms on SEMA. The thesis details the signal preprocessing, training

and testing required for the AI technique. It also compares its performance with the

original template attack in SEMA.

1.3 Thesis Overview

This thesis is composed of 10 chapters, and remaining chapters can be roughly divided

into 4 main parts. Chapter two to four provides the technical background materials

required for understanding the concepts in this thesis. Chapter two provides information

on the origins of EM signals and common capture techniques. Chapter three gives a brief

background on different types of side channel attacks applied on power and EM

emanation. Chapter four introduces the mathematical background and implementations

of ECC algorithms on PDA.

4

Chapter five and six describe my research contributions. Chapter five presents

my new methodologies and strategies for performing DEMA. Chapter six presents my

invention that incorporates AI programming paradigm into a template attack model for

SEMA.

Chapter seven to nine present experimental setup and results. Chapter seven

discusses experimental setup and methodology, which apply to both DEMA and SEMA

experiments. Chapter eight describes the experimental setup and results for DEMA.

Chapter nine describes the experimental setup and result for SEMA. This includes the

experimental steps for finding the parameters in the AI attack system.

Chapter ten is the conclusion chapter, which discusses the limitations,

countermeasures, summary and future work for this research.

5

2 Introduction to EM Signal Capture and Analysis

In the early 50‘s, the U.S government became concerned that an enemy can reconstruct

sensitive information from EM signals radiated from cryptographic equipment [Murray].

Some crypted teletype units were found to radiate small traces of clear text signals,

beneath the normal crypted output. Sophisticated equipment can be used to isolate and

amplify the clear text signals.

Gandolfi et al were the first to provide concrete results on EM attacks on modern

cryptographic devices. [GMO01] Soon after Quisquater et al extrapolated attacks

strategies for power signals (SPA and DPA) to EM signals (SEMA and DEMA) [QS01].

The following sections describe the types of EM signals, capture techniques, their

application on PDA devices, and spectrogram analysis of these signals.

2.1 Origin and Types of EM Signals

EM emanation is caused by current flow within the control, I/O, data processing, or other

parts of a device. Any electrical current flowing through a conductor induces

electromagnetic emanations. For instance, during switching of a CMOS gate shown in

Figure 2-1, a short current pulse travels from power to ground line, thereby emitting an

EM signal whenever the logic state flips.

Figure 2-1: CMOS gate

6

This partially explains the correlation between the EM signal and the transition’s

Hamming distance. However, not only do current carrying components produce their

own emanation, but they also affect emanations from other components due to coupling

and circuit geometry.

Different areas of the chip radiate with different intensities and varying code

dependencies. However there is an important distinction between high energy signals

and high information signals. A data bus may be a good source of high information

signals as they may be correlated with the bus data. On the other hand, power lines may

produce high energy signals that have no data correlation. Experimentally the most

active points appear to be located near the CPU and data buses. Figure 2-2 is a 3-D

diagram showing the region of highest EM emanation on a processor [QS01]. One

should take precautions to make sure the high information signals are not overwhelmed

by the high energy but low information signals.

Figure 2-2: 3D diagram of EM emanation [QS01]

Although the spectral power of EM signals decreases with increasing clock

frequency of the computation device, the radiation effectiveness varies directly with

frequency [QS01]. Hence modern computing devices running at high clock frequency

are more vulnerable to EM analysis.

7

One important difference of EM emanation from other side channels such as

power signals is that the output of even a single EM sensor consists of multiple

compromising signals of different types, strengths, and information content. Each active

component of the device produces and induces different emanations, which provide

different views of the events occurring within the devices. These views can be obtained

by using different types and positions of sensors, or by simply focusing on different

emanations captured by a single sensor. This is very different from power analysis where

there is only a single view of net current flow.

2.2 Capture of EM Signals

EM signals propagate via either radiation, conduction, or a complex combination of both

methods from the device. Conductive emanations consist of tiny currents found on all

conductive surfaces or lines attached to the device, possibly riding on top of stronger,

intentional currents within the same conductors, such as a power line. Current probes and

oscilloscope can be used to measure these tiny current. In fact the capture and processing

equipment for conductive emanation is very similar to those for power analysis.

The EM radiation signals may be captured by various kinds of EM probes. An

example of these probes consists of a small highly conducting metal, such as copper or

silver plate, attached to a coaxial cable. Another type consists of solenoid made of coiled

copper wire of outer diameter varying between 150 and 500 microns, shown in Figure 2-

3. Since EM analysis without direct physical access is far more dangerous, the radiation

signal is the focus of this research.

The quality of the received signal improves if the equipment is shielded from

interfering EM emanations in the band of interest. Ideally EM analysis should be

performed inside a Faraday cage that shields the equipment from ambient EM

emanations. However, this is difficult to accomplish and it is far more productive to

ensure there is no strong source of interfering EM signals [GMO01]. There are two

general types of EM emanations that may correlate with secret data: direct emanation and

unintended emanation [AAR02].

8

Figure 2-3. Near field probe [GMO01]

2.3 Direct Emanation

Direct emanation is EM radiation induced by current flows within a computing device.

Since activities within a device are synchronized with the system clock, current flows

tend to occur in short bursts with sharp rising edges resulting in emanations observable

over a wide frequency band [AAR02]. Often components in higher frequencies can be

easier to detect as noise and interference are prevalent in the lower frequency bands.

In general direct emanation signals are weak and difficult to detect. In complex

circuits, isolating direct emanations may require use of tiny field probes positioned very

close to the signal source or even direct attachment of the probes to the signal sources

[GM001]. In some cases this requires decapsulating the device package. The use of

filters and amplifiers is necessary to improve the captured signal.

2.4 Unintended Emanation

Unintended emanations are caused by electrical and electromagnetic coupling between

devices in close proximity in modern digital devices, due to the increased miniaturization

and complexity of the CMOS devices [AAR02]. Although these couplings are mostly

harmless to the device function, they provide a rich source of information signals. A

9

weak and otherwise undetectable information signal can modulate a strong carrier signal,

thereby allowing the information to be recovered.

There are two classes of unintended emanations. The first class is amplitude-

modulated (AM) signal, which is caused by non-linear coupling between a carrier signal

and a data signal [AAR02]. The data signal can be demodulated and is thereby extracted

using a receiver tuned to the carrier frequency. A strong source of carrier signal is the

ubiquitous harmonic-rich square-wave clock signal. An advantage of using a harmonic-

rich carrier such as a clock is that the attacker can choose higher harmonics of the clock

frequency which have higher radiation effectiveness and is in a frequency band with less

noise and interference. The other class of unintended emanations is phase-modulated

(PM) signal, which results from coupling between data signals and communication

circuits. The data signal may then be recovered by phase demodulations of the generated

signal [AAR02].

In general extracting information from unintended signal is easier because

modulated carrier can have substantially greater propagation than direct emanation.

Modulated carriers can be detected several feet away from the device [AAR02]. In

contrast, direct emanation must be detected by sensors within a couple of millimeters

from the source. Hence EM analysis can be launched in a distance and without resorting

to invasive techniques.

The probe should be connected to receiver that demodulates the signal, which is

subsequently connected to a digital scope. If the unintentional emanation is an AM signal

riding on a harmonic-rich carrier, it may be advantageous to capture a signal modulated

by a harmonic of the carrier, as high frequency signals have better propagation. Lower

harmonics suffer from noise and interference, while higher harmonics have lower signal

strength due to non-ideal clock waveforms [AAR02]. The receiver/demodulator should

be tuned to the carrier harmonic with the best tradeoff between signal frequency and

strength.

10

2.5 Benefits of EM Analysis of PDA

Common side channel techniques use power signals. However EM signals are more

suitable for attacking PDA. First of all, a PDA has a more powerful processor that

operates at higher clock frequency that produces stronger EM radiation. Secondly, it is

inconvenient for an attacker to obtain power signals of PDA as they operate on an

internal battery as supposed to an external power source. Obtaining a power signal from

a PDA would require physical access to the device. On the other hand, obtaining EM

signals from PDA is relatively easy because PDAs are mobile and their signals can be

captured by adversaries while these devices are in use. Finally even in the event that an

attacker has obtained a PDA device, it is easier to use EM emanation as a side channel.

Measuring the power drained from a PDA battery is like finding a needle in a haystack,

because PDA has many other components like DSP processor, non-volatile memory unit,

radio receiver and LCD screen. The attacker really needs to find a component that

produces compromising power signals. With EM emanation, it is very convenient to

determine which component produces the strongest information-rich signal. For these

reasons, EM analysis is most suitable for PDA devices.

2.6 Spectrogram

Typically, SEMA and DEMA are performed with time domain and frequency domain

signals. However it is also possible to perform an analysis on spectrogram signals.

Spectrogram is a type of time-dependant frequency analysis. It consists of frequency

domain signals of successive sequence of time windows, as shown in figure 2-4. The

vertical axis represents frequencies up to about 6 MHz, and the horizontal axis shows

positive time toward the right up to 2 ms. The color indicates the frequency signal

amplitude.

11

Figure 2-4. Spectrogram

The optimal window size varies depending on application, and may be found

experimentally. There is a trade-off between frequency resolution and time resolution,

whereas the window size is directly proportional to frequency resolution but indirectly

proportional to time resolution. Higher frequency resolution provides more details of the

frequency content; on the other hand, higher time resolution shows more precisely how

the frequency content changes over time. The frequency range is related only to the

sampling rate of the signal acquisition.

The spectrogram windows should overlap so that signals near the edges of the

windows would not be lost in frequency analysis. A higher degree of overlap reduces

lost in frequency information, but incurs heavier computations.

Spectrogram has many important applications in engineering. In speech analysis,

for example, the phonemes of a spoken word have unique and distinguishable signals in

the frequency domain. A speech recognition system uses spectrogram to detect the

sequence of phonemes of a word. It was found that spectrogram is also useful in

differential analysis attacks on either power or EM signals.

12

3 Introduction to Side Channel Attacks

A side channel is an unintended channel that contains leakage of sensitive information

from a cryptography computation. In theory a side channel exists in every cryptography

system. The underlying implementations of all cryptographic algorithms are physical

processes where data elements are represented by physical quantities (e.g. electric charge)

stored in a physical structure (e.g. transistor) [GMO01]. These physical quantities require

a minimum time to be sensed, transmitted, and stored. As well, all computations involve

state changes at the underlying physical structures which, from the laws of

thermodynamics, must inflict an invertible change from one form of energy to another.

Therefore all computations emit a certain amount of energy at distinguishable time

intervals, which form the basic proponents of a side channel.

Previous research in side channel attacks has focused on applications such as pay-

TV smart cards, prepayment meter tokens, and smart card. The five types of side channel

attacks discussed here are timing analysis [K96], fault analysis [BDL], simple analysis

[O02], differential analysis [KJJ99] and template attacks [CRR02]. For the later three

forms of attacks, they are commonly applied on power and EM signals, which correlate

with bits of internal storage during encryption. At each clock cycle, the activities of the

transistors produce a unique signature in these side channels that may be exploited to

recover the secret keying materials. Power signals may be extracted from the device’s

battery or an external power source. The EM signal may be captured with an appropriate

EM probe.

3.1 Timing Analysis

The earliest and most primitive form of side channel uses timing characteristics of the

implementations of the cryptographic algorithm to break it. For example, length of time

to compute scalar multiplication based on binary expansion correlates with the Hamming

weight of the scalar, since point addition is only performed for a scalar bit equals to one.

13

Paul Kocher et al. showed how measurement of the amount of time required to

perform private key operations is used to find the key exponents and thereby breaking the

cryptographic system [K96]. An obvious countermeasure is to remove any correlation

between the secret and operating timing by adding redundant operations.

In general, timing analysis is not a practical cryptanalysis technique. First of all,

it is difficult to precisely measure the duration of a cryptography operation. Secondly,

timing analysis only provides limited amount of information about the secret in a

cryptographic system, such as the Hamming weight of a scalar. Therefore, timing

analysis is not considered a serious threat to security.

3.2 Fault Analysis

Fault analysis is initially proposed by Boneh et al. [BDL] which applies to algebraic

structure used in public key cryptography. In an implementation of RSA based on the

Chinese remainder theorem, Boneh et al show that given one faulty version of RSA

signature, one can effectively factor the RSA modulus with high probability.

Biham and Shamir proposed a related attack known as Differential Fault Analysis

(DFA) which applies to all common secret key cryptosystems [BS96]. In DFA, one set

of data is encrypted from a working device, while another set is encrypted with a device

with induced random faults. Comparison of these sets of data can yield information

about the secret key. DFA can find the last DES round key using less than 200 cipher

texts. Furthermore, DFA can break triple-DES with similar number of ciphertexts.

The discovery of fault analysis shows the importance of verifying the correctness

of computational results for security reasons. For instance, a device that generates an

ECC signature should verify its correctness before it is issued. On detecting a faulty

computation the ciphertext needs to be suppressed to protect the secret key.

14

3.3 Simple Analysis on Power/EM Signals

Simple Power Analysis (SPA) utilizes power consumption data from a single encryption

to extract secret information [O02]. It is possible to use power traces from multiple runs

of the encryption, however, it must encrypt with the same secret and plaintext. For

simplicity this section describes simple analysis with respect to power signals only,

however these techniques are directly applicable to EM signals, and their counterparts are

termed as Simple EM Analysis (SEMA). There are no known simple analysis results on

ECC computations on PDA devices.

Very often, keying information is extracted from a single sample due to leakage

from the execution of key dependent code and/or the use of instructions which leak

substantial information in the side channel over the noise. The adversary is assumed to

have some fairly explicit knowledge of the analyzed cryptosystem. In particular, he

knows the time at which the power consumption is correlated with part of the secret.

Different operations in encryption/decryption produce distinguishable power

signatures. For example, the ECC point addition and double operations should produce

distinguishable power signatures. An adversary examining the power consumption can

determine the sequence of operations in the encryption/decryption, thereby deciphering

the secret key of the cryptosystem.

However, SPA is not always practical because it requires that the adversary has

detailed knowledge of the encryption algorithm. Furthermore, it is difficult to distinguish

the different operations from the power signals. This thesis describe a innovative

approach of accurately recognizing side channel signals of different operations.

3.4 Differential Analysis on Power/EM Signals

Differential Power Analysis (DPA) is proposed by Paul Kocher in 1998. [KJJ99] The

methodologies for DPA can be applied to EM signals, and they are termed as Differential

EM Analysis (DEMA). There are no known differential analysis results on PDA devices.

15

DPA relies on a statistical analysis of a large number of samples where the same

keying material is used to operate on different data. An adversary captures the power

consumptions of many runs of the encryption operations using the same secret key but

different plaintexts. The adversary discovers the secret key by guessing each key bit and

using statistical techniques to verify this guess.

The adversary begins by identifying a partition bit of a good intermediate variable

within the encryption algorithm. A good intermediate variable is one that contributes

significantly to the power consumption, perhaps one that is accessed repeatedly in the

algorithm. As well, it is dependant only on the plaintext and one key bit. For instance,

the input point in an ECC point operation may a good candidate. An adversary makes an

arbitrary guess on a key bit value, and calculates the value of the partition bit based on

this guess.

The adversary divides the power consumption traces into two sets; one set for the

partition bit value of 1 and another set for the value of 0. The adversary then finds the

differential signal of the average power consumption of the two sets of traces.

The partition bit value is correct at each iteration when the guess on key bit is

correct. Consequently the differential signal should show significant spikes, where the

spikes correspond to the times when the partition bit corresponds with power, perhaps

when the bit is being accessed. On the other hand, if the guess is incorrect, both sets

would have very similar average power consumption and no noticeable spikes are

observed in the differential signal. Either way, the adversary would discover the true key

bit value.

The adversary can now move on to the next key bit; he would guess the key bit,

partition the traces based on this guess, and find the differential signal, thereby recovering

the key bit value. This process will continue until the entire key is discovered.

DPA is more superior to SPA in that the adversary does not need such specific

information about how the analyzed device implements its function. In particular, the

adversary can be ignorant of the specific times at which the power consumption is

correlated with the secret, though it is only necessary that the correlation is reasonable

consistent.

16

However, DPA requires knowledge of the plaintext and can only be applied for

chosen-plaintext attacks. This is not true for SPA.

3.5 Template Attack on Power/EM Signals

The template attack is derived from Signal Detection and Estimation Theory, and it is the

strongest form of side channel attack possible in an information theoretic sense [CRR02].

There are no known results of template attacks on ECC computations of any

implementations.

While other common techniques view noise as distortion that needs to be reduced

or eliminated, the template analysis views noise as a source of information and focuses on

modeling noise to fully extract information present in a single sample. A template is a

model that characterizes original signal and noise for an operation, based on the

assumption that the captured signal is a linear combination of these components. It

requires the adversary to access an identical experimental device which he can program

to his choosing and obtain as many side channel samples as needed. This is a reasonable

requirement if the target device is a widely commercially available product.

For a device that can perform one of the K operations {O1…Ok}, an adversary can

use template attack to identify the operation performed given only one sample. The

different operations could be different point operations in ECC, or it could be the same

operation applied on different input data. The template is derived from L samples

(typical one thousand) on the experiment device for each of the K operations. The signal

component of the template is the average signal M. Typically, only a subset of points (N

points) with large deviation is selected to build the template. The noise vector for each

sample is the difference between the sample signal and average signal. Mathematically,

the noise vector Ni(T) for sample T of operation Oi is computed as follows.

])[][],...,1[]1[()( NMNTMTTN iii −−=

17

The noise is assumed to be Gaussian, and the noise component of the template is

characterized by the noise covariance matrix from L samples of operation Oi.

⎥⎥⎥

⎦

⎤

⎢⎢⎢

⎣

⎡=∑

])[],[cov(...])[],1[cov(......

])1[],[cov(...])1[],1[cov(

NNNNNNN

NNNNN

iiii

iiii

Ni

Once a template is built for each of the K operations, it is possible to calculate the

probability of a sample originating from each of the possible operations using the

equation below. The most likely operation for an observed sample can then be found.

⎟⎠⎞

⎜⎝⎛−= ∑

∑−1

21exp

)2(

1)(i

i

i NT

NN

N nnnpπ

18

4 Introduction to Elliptic Curve Cryptography

Public key cryptosystems are commonly used in digital signature and other authentication

applications, where the verifier can check the authenticity of the signer without

knowledge of the signer’s secret key. Common public key cryptosystems include ECC,

RSA and ElGamal. Although public key cryptography is vital for many security

applications, they tend to be costly computationally. The efficiency of Elliptic Curve

Cryptography (ECC) makes it suitable for resource-constrained devices such as

smartcards and PDAs.

The following sections provide descriptions of ECC mathematics, benefits of their

uses on PDA, common implementations, and common countermeasures against side

channel attacks.

4.1 Mathematical Overview

ECC is widely considered to be a more efficient public key system. Since it is first

proposed by Victor Miller [Miller] and Neal Koblitz [Koblitz] in 1985, it has received

much attention from the research community and industry.

The principles of ECC are similar to those of other public key cryptosystems in

that they operate on elements of a defined cyclic group; Protocols of other public key

systems can apply to ECC. However ECC can provide an equivalent amount of security

using smaller key sizes. As well, it allows for many different implementations tailored

for different requirements.

An elliptic curve is s set of points (x, y) which are solutions of a bivariate cubic

equation over a finite field. EC defined over a large prime field GF(p) satisfies an

equation of the form:

baxxy ++= 32 where )(, pGFba ∈ .

19

When defined over a binary field GF(2n), EC satisfies an equation of the form:

baxxxyy ++=+ 232 where )2(, nGFba ∈ .

ECC usually employs curves whose order is the product of a large prime and a

very small integer h, called cofactor. The cofactor h is often 1. The parameter a is often

chosen to be -1 for efficient implementation. The set of points on EC, together with the

special point O known as point at infinity, form the elements of the cyclic group. The

operations of this cyclic group are point addition and doubling, and the point at infinity is

the additive identity.

The scalar multiplication in ECC is analogous to the integer exponentiation in

RSA. Scalar multiplication is the dominant operation in ECC protocols and is a logical

point of attack by cryptanalysts. It is the operation of adding a point P to itself d times to

compute dP.

4.2 Benefits for PDA Implementation

The security of ECC protocols depends on how well the scalar d is hidden in scalar

multiplication, since this scalar is related to the secret private key. Common

mathematical attacks involve solving the elliptic curve discrete logarithm problem

(ECDLP), which is the problem of finding the secret scalar d given the known points P

and dP. No subexponential time algorithm is known for ECDLP in the class of non-

supersingular EC.

ECDLP in ECC is analogous to DLP in ElGamal and integer factoring problem in

RSA; however ECDLP appears to be much more difficult over a finite field of the same

size. Therefore ECC can offer equivalent security with fewer key bits. Fewer key bits

reduces computational and bandwidth requirement of the cryptosystems. The key bit size

is directly proportional to computational time of a cryptosystem. As well, a small key

requires less bandwidth to transmit. These properties make ECC ideal for resource

20

constrained devices such as PDA that have less powerful processor, limited power supply

and slow wireless network connectivity.

4.3 Implementation

There are many implementations of the scalar multiplication method. The most

straight-forward approach is double-and-add approach based on binary expansion of d, as

shown in algorithm 4-1. This implementation is used for all the experiments described in

this thesis.

An improvement one can apply to this algorithm is to convert the scalar into the

non-adjacent form (NAF). NAF of scalar d is a signed binary expansion of d with no two

adjacent bits being non-zero. The NAF form has the fewest nonzero coefficients of any

binary expansion and there exists an simple algorithm for converting to NAF form. The

modified scalar multiplication algorithm using NAF form is the addition-subtraction

method, as shown in algorithm 4-2.

Another way to speed up computation of scalar multiplication is to use the

window’s method which takes advantages of precomputed lookup tables. Essentially, the

scalar bits are divided into appropriated sized windows, and point multiplication by each

window of bits is found from a lookup table.

Algorithm 4-1: Double-and-Add Scalar Multiplication

Input: P and d=(dh-1,…,d0)2

Output: Q = dP Q := 0 for (j := h - 1; j >= 0; j--) { Q := dbl(Q) if (dj = 1) Q := add(Q, P) } return Q

21

Algorithm 4-2: Add-Subtract Scalar Multiplication

ECC and public key systems are mostly used for authentication and key exchange

application. The ECC protocol for digital signature is elliptic curve digital signature

algorithm (ECDSA) described in ANSI X9.62. Two ECC protocols for key exchange are

elliptic curve diffie-hellman (ECDH) key exchange, described in ANSI X9.62, and a

scheme developed by Menezes, Qu, and Vanstone (ECMQV).

Public key systems are very slow at encryption compared to their symmetric key

counterparts. However they can be used to encrypt a symmetric key between

communication parties. A protocol for encryption is elliptic curve integrated encryption

scheme (ECIES) described in ANSI X9.63.

4.4 Countermeasures to Thwart Side Channel Attacks

There are three basic approaches to resist simple analysis: indistinguishable formulas for

point operations, identical operation sequence regardless of key bits, and random addition

chain.

There are two classes of elliptic curves for the first approach. The Jacobi form

and Hesse form [LS01] [JQ01] elliptic curves achieve this as they use the same formulas

for point doubling and addition. However, this requires specifically chosen curves and

not generally applicable. Brier and Joye proposed an indistinguishable addition and

Input: P and d=(dh-1,…,d0)2

Output: Q = dP Q := 0 for (j := h - 1; j >= 0; j--) { Q := dbl(Q) if (dj = +1) Q := add(Q, P) if (dj = -1) Q := sub(Q, P) } return Q

22

doubling algorithm applicable to Weierstrass form curves [BJ02], but it fails on certain

inputs, making it vulnerable to attacks [IT02b].

The second approach is applied in two scalar multiplication algorithms: double-

and-always-add, as shown in algorithm 4-3 [Cor99], and Montgomery ladder [OS00,

Mo87] algorithms. The Montgomery ladder algorithm was later extended to general

curves [BJ02, IT02a].

Algorithm 4-3: Double-and-Add-Always Scalar Multiplication

These algorithms are resistant to simple analysis in that the same sequence of point

operations are performed regardless of the scalar value, hence an attacker cannot decipher

the scalar bits even if the point operations are distinguishable in the side channel.

The third approach is to use a special addition chain with a sequence of additions

and doublings that can mutate randomly. One algorithm using this approach is

randomized addition-subtraction chain [OA01]. Instead of using only addition and

double operations as in the standard scalar multiplication algorithms, this algorithm can

also use subtraction operation to perform scalar multiplication. The advantage is that

there may be plural addition-subtraction chains for a given scalar; this follows directly

from the fact signed digit representations are redundant. For instance, there is a unique

binary representation (1111)2 for the number 15. However, the number 15 can have

multiple signed digit representations such as (1000Ī)SD and (10Ī11)SD. This allows the

algorithm to choose any permutation of the three point operations to perform scalar

multiplication, making simple and differential analysis difficult.

Input: P and k=(kh-1,…,k0)2

Output: Q = kP Q[0] := 0; for (j := h - 1; j >= 0; j--) { Q[0] := ECCDBL(Q[0]) Q[1] := ECCADD(Q[0], P) Q[0] := Q[kj] } return Q[0]

23

Scalar multiplication algorithms that are secured against simple analysis may be

vulnerable to differential analysis. Fortunately, it is easy to enhance an algorithm to

resist differential analysis as well. There are two general approaches to resisting

differential analysis: randomizing the base point P and randomizing the scalar k.

One application of the first approach is point blinding [Cor99], which blinds the

side channel information by adding a random point R to the input and subtracting a

random point S to the output of a scalar multiplication algorithm (resistant to simple

analysis). The point S equals to the point kR, hence the final result is correctly computed.

The points R and S are updated before each multiplication to reduce leakage of

these values in the side channel. They can be conveniently updated as follows:

R := (-1)b2R S := (-1)b2S where b is a random integer

The second application of base-point randomization is projective randomization

[Cor99]. There are many varieties of projective coordinates [BHL00]. A point

represented in Jacobian coordinates P = (X:Y:Z), for example, can be equivalently

expressed as coordinate (r2X:r3Y:rZ) where r belongs to the set of finite field elements

excluding additive identity. The countermeasure transforms a base point (X:Y:Z) into

(r2X:r3Y:rZ) with a random r before starting the scalar multiplication which effectively

randomize any side channel information.

The last application of base-point randomization is proposed by Joye and Tymen

[JT01], which is based on randomly selected isomorphisms between elliptic curves. The

base point P = (x,y) and the parameters a, b of an elliptic curve can be randomized into P’

= (r2x,r3y) and a’ = r4a, b’ = r6b, which would randomize any side channel information.

One application of scalar randomization is scalar blinding [Cor99]. This involves

randomizing the scalar k into (k + r*#E(K)), where #E(K) is the order of the elliptic curve

K and r is a random number.

Another application of this approach is randomized multiplier recording [JT01],

which applies to Koblitz curves of GF(2m). The algorithm involves randomly choosing a

one of multiple NAF expansions of the scalar k.

24

There are some countermeasures against differential analysis that are designed for

window method of scalar multiplication. The first method is Overlapping Window

Method [IYTT02], which counters differential analysis by having overlapping adjacent

windows and thus allowing the plural possible values of windows for a given scalar k.

The second method is Randomized Table Window Method [IYTT02], which counters

differential analysis by randomizing the pre-computed table and normalizing the

randomized data to obtain the correct final result.

25

5 Proposed Methodology of DEMA

This chapter presents the proposed methodology of DEMA against ECC computations on

PDA implementation. There are no known DEMA results for ECC computations on

PDA devices.

The concept of differential EM Analysis (DEMA) is first proposed by Quisquater

[QS01] et al, and is modeled after DPA. It involves statistical analysis of multiple traces

of EM emanations.

As discussed in section 2.5, EM side channel is particularly devastating for PDA

device due to its mobility and device characteristics. The point of attack in ECC

algorithms is ECC scalar multiplication as it is the dominant operation in ECC

cryptographic operations. In most cases, the base point in scalar multiplication is a fixed

parameter in a cryptographic system, and the scalar is derived from the secret private key

and some random secret. Being able to recover the scalar would compromise the

confidentiality of the secret private key and hence the scalar needs to be protected. The

difficulty of ECDLP makes it hard to find the secret scalar even with the knowledge of

the input and output points. The goal of DEMA is to recover this secret scalar from EM

side channel with significantly less computational resources and time.

In DEMA the attacker split the EM traces into two sets, dependant on the guess of

one or a small group of scalar bits. If the guess on the scalar bits is correct, the attacker

should detect significant differences in average EM signals of the two sets.

An optimal way to perform trace splitting is described in section 6.1. After the

traces are split, statistical techniques are applied to analyze the two sets to determine if

their difference is significant. This is used to verify the guess and consequently recover

the secret scalar. This process is described in 6.2. It was found that frequency domain

and spectrogram may be useful in DEMA. The analogous processes with frequency

domain and spectrogram are described in 6.3.

The discussion of DEMA focuses on algorithms that are resistant to simple

analysis, as SEMA is more suitable for attacking those that are vulnerable to simple

analysis. There are different strategies of DEMA against different scalar multiplication

26

algorithms. The simplest case is to perform DEMA on scalar multiplication when the

attacker can clearly distinguish between double and addition operations, which is

described in section 6.4. However, some scalar multiplication algorithms use

indistinguishable formulas for point addition and double. Performing DEMA on such

scalar multiplication algorithm is more difficult and is described in section 6.5. Finally,

DEMA on window method, which is the most common and efficient way of performing

scalar multiplication, is discussed. This is described in section 6.6.

5.1 Proposed Trace Splitting Strategy

The attacker must pick a partial value for trace splitting that depends on the input point

and varies with each part of the scalar. It is better from the attacker’s point of view that

this change is affected by a small part of the scalar at each iteration of the algorithm.

For example, in the standard scalar multiplication, one should pick a partial value

that changes as each scalar bit is processed. In the window method, a desirable partial

value would change after each window of the scalar is processed. Only one bit value is

needed to partition the traces into two bins. Optimally, the attacker should pick a partial

variable and a bit of this variable that have the biggest impact on the EM emanation.

Some literatures claim that the Hamming weight of the partial value affects the

magnitude of the emanation. However, there is no evidence to indicate that a function

that operates with an operand with higher Hamming weight would produces higher or

lower levels of EM emanation. A register built in CMOS technology that has more bits

set to 1 would not dissipate more energy, since there are no static current in CMOS

transistors. On the other hand, the Hamming distance of a register value change may

correlate with energy dissipation and EM emanation.

In fact, each bit has differing impact on the EM side channel. The best partition

bit appears to be the most significant bit (MSB) of an input point coordinate (x or y

coordinate) for point operation. It is simple to see that an input point has a great impact

on the computations within a point operation. The coordinate of an input is used in

numerous prime field computations over GF(p) within a point operation such as squaring,

multiplication, and addition. The reason for choosing MSB as the partition bit is that

27

when the MSB is 1, there is a much higher probability that subsequent finite field

computations with the coordinate would result a carry-out, which would trigger an

avalanche of different computations producing distinguishable side channel signals.

Typical extra computations are performed to handle carry-out in a finite field

operation. For example, when carry-out occurs during a modular addition shown in

algorithm 5-1, the result must then be subtracted by the field defining polynomial p. The

extra field subtraction operation would produce distinguishable signal in the EM side

channel.

Algorithm 5-1: Modular Addition

).(Return 5. then If 4.

. from psubtract then set, isbit carry theIf 3.).arry(Add_with_c :do 1- t to1 from iFor 2.

).Add(c 1..mod :Output

1].-[0,integers and , modulusA :Input

0121

00

00o

cc -p.c p c

),c,c,...,c (cc ,bac

,ba p b) (a c

pa,bp

t-

i

←≥=

←←

+=∈

Similarly, when borrowing occurs during a modular subtraction in algorithm 5-2, the

result must be added by the polynomial p. Again, the extra addition operation would

produce distinguishable EM signals.

Algorithm 5-2: Modular Subtraction

).(Return 4.. from p add then set, isbit carry theIf 3.

).orrow(Sub_with_b :do 1- t to1 from iFor 2.).Subtract(c 1.

.mod :Output1].-[0,integers and , modulusA :Input

0121

00

00o

c),c,c,...,c (cc

,bac,ba

p b) (a c pa,bp

t-

i

=←

←−=

∈

The coordinates of the ECC points are finite field elements. In the DblJJ point operation,

the x-coordinate of the input is directly fed as input to three finite field operations: scalar

28

multiplication by 4, modular addition, and modular subtraction. Scalar multiplication is

implemented by bitwise shift to the left. It is simple to see that in scalar multiplication by

4, carry-out occurs when one of the two most significant bits of the field element is one.

The carry-out can be observed in the EM side channel.

The MSB of the x-coordinate also changes the probability of carry-out in the

addition operation. When MSB is 1, the probability of carry-out in addition operation

between the x-coordinate and a random finite field element is approximately ¾.

However, when MSB is 0, the probability becomes approximately ¼.

To show that this is true, in the addition of the x-coordinate and a random element

over GF(p), there are n different values of the random element that can cause carry-out

when the x-coordinate is equal to n. In the case when MSB is set to 0, since the range of

the x-coordinate is from 2191-1 to 0, the number of ways a carry-out can occur is the sum

of the arithmetic series from 2191-1 to 0. In the case when MSB is set to 1, since the

range of the x-coordinate is from 2192-1 to 2191, the number of ways a carry-out can occur

is the sum of the arithmetic series from 2192-1 to 2191. The total number of different

combinations of x-coordinates (with MSB sets to a value) and random element is p2/2.

For simplicity, the calculations below assumes p is approximately equaled to 2192. The

calculations are shown as follows.

41)(

2)2)(2(

21)(

2)2)(12(

21)(

21)(

0MSBwhen

191191

383

191191

383

12

0383

191

≈

×≈

−×≈

×≈

=

∑−

=

overflowP

overflowP

overflowP

noverflowPn

29

43)(

2)12)(12(2

21)(

2)22)(22(

21)(

2)1212)(212(

21)(

21)(

1MSBwhen

382

383

191192191192

383

191192191192

383

12

2383

192

191

≈

−+××≈

−+×≈

+−−+−×≈

×≈

=

∑−

=

overflowP

overflowP

overflowP

overflowP

noverflowPn

Using a similar analysis, one can show that MSB of the x coordinate changes the

probability of underflow in subtraction operation. When MSB is 1, the probability of

underflow in addition operation between the x-coordinate and a random finite field

element is ¼. However, when MSB is 0, the probability becomes ¾.

In summary, since the MSB value can both exactly and probabilistically cause

modulo reduction in finite field operations, it correlates with the signals in the EM side

channel.

With MSB partitioning, a correct guess of the key bits would generate

distinguishable spikes in differential signals. With MSB partition and assuming the

guesses on key bits are correct, the attacker obtained a bin of traces from a point

operation that has high frequency of carry-out, and another bin for a point operation

having low frequency of carry-out, as shown in figure 5-1.

Figure 5-1: MSB Partitioning

30

5.2 Proposed Differential Analysis of Traces

Once the traces are partitioned, statistical techniques are applied to the sets to verify the

scalar bit guess. For simplicity, the case where the attacker is guessing one key bit at a

time will be considered. The more complicated case where the attacker must guess a

window of bits at a time is described in section 5.6.

Essentially, if the bit guess is correct, there should be significant difference

between the averages of the two sets of signals. However, one needs a systematic way to

decide if the differential signal is significant by comparison to a reference signal. The

simplest reference is a constant signal as in figure 5-2, where the significance of a peak is

its multiple of this reference. However this reference does not consider the degree of

variability in EM signals due to noise at different time points.

Constant Reference

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1 2 3 4 5 6 7 8 9 10

Time

Ampl

itude

RefDiff Signal

Figure 5-2: Constant Reference Signal

A better reference signal would be the standard deviation of difference of means

(SD-DOM), as shown in algorithm 5-3. SD-DOM is a measurement of the variability of

EM traces. The SD-DOM peaks at time points with high variability.

31

Algorithm 5-3: Standard Deviation of Difference of Means

Ru

D

u

DR

VuVu(VD(VD

)V(V

RV b

return :6

:5

)SIZE(:4)SIZE(:3)STD:2)STD:1

,SD_DOM

set signal of size SIZE()functiondeviation standard STD()

means of difference ofdeviation standard bset signal

1

21

0

20

11

00

11

00

10

+←

←

←

←

←

==

==

In figure 5-3, the SD-DOM signal indicates that EM traces are noisy from time

point 2 to 3, while the EM traces are quiet from time point 5 to 6. Even though there are

two identical peaks in the differential signal, the second one is considered to be more

significant because it occurs at time points when the EM traces are quiet. If the constant

reference signal was used, as in figure 1, both peaks would be considered as equally

significant.

32

SD-DOM Reference

0

0.5

1

1.5

2

2.5

1 2 3 4 5 6 7 8 9 10

Time

Ampl

itude Ref

Diff Signal

Figure 5-3: SD-DOM Reference Signal

Still, even armed with this definition, there are multiple ways to decide if the

overall differential signal is significant. One way is to simply look at the most significant

peak and calculates the ratio of this amplitude to the reference. This is great if the

differential signals for correct and incorrect bit guess are easily distinguishable, and the

most significant peaks from the two signals are far apart, as shown in figure 5-4.

However for noisy signals from PDA devices, the most significant peaks resulting from

correct and incorrect guesses are sometimes close together, as in figure 5-5.

33

Ideal Differential Signal

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

1 2 3 4 5 6 7 8 9 10

Time

Ampl

itude Ref

Correct GuessWrong Guess

Figure 5-4: Ideal Differential Signal

Actual Differential Signal

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1 2 3 4 5 6 7 8 9 10

Time

Ampl

itude Ref

Correct GuessWrong Guess

Figure 5-5: Actual Differential Signal

Perhaps the better approach is to consider a top percentile of peaks. A reasonable

criterion for a significant differential signal would require, for example, that the most

significant one percentile of peaks be greater than κ * SD_DOM. The most suitable

34

value for the coefficient κ should be found experimentally. Algorithm 5-4 defines how to

calculate the ratio between the differential signal and reference signal.

Algorithm 5-4: Ratio between Differential and Reference Signal

StRtDabstS

mttTTD

)T(TRmTT

SmT b

return :5)(/))(()( :4:}1,,0{ ,each for :3

)(Mean)(Mean:2,SD_DOM:1

),,DEMA(

set signal ofmean MEAN()reference and diffbetween ratio

traceEMfor points timeofnumber bset traceEM

01

10

10

←−∈

−←

←

====

K

5.3 Proposed Differential Analysis in Frequency and Spectrogram

Previous research on differential analysis focus on the analysis of time domain

signals. However, it is also applicable to frequency domain signals. In theory, peaks of

differential signals in time domain should also appear in frequency domain, as any

changes in time domain signals would induce changes in the frequency domain signals.

Frequency domain signal is particularly useful for second-order differential analysis,

where correlation exists between EM emanations of two time points [WW2]. As well,

frequency analysis may reveal loops and other repeating structures in an algorithm that is

not possible with time domain analysis. More importantly, frequency signals are less

sensitive to random jitters and delays of time signals due to slight variations in instruction

execution times that are very common in more advanced mobile devices.

However there are two problems with using frequency domain signals in

differential analysis. First, it reveals no information of when data-dependant operations

occur. This timing information is very useful as it helps an adversary focus the signal

35

analysis on these data-dependant operations. Secondly, any peaks in frequency domain

due to an event that occurs in a short duration may be discernable if the acquisition

duration is a lot longer. The solution of these problems is to use spectrogram, which is a

time dependant frequency analysis.

Time domain analysis and frequency domain analysis are essentially two special

cases of spectrogram, as shown in figure 5-6. Frequency domain analysis is equivalent to

spectrogram with one time window. Time domain analysis is equivalent to spectrogram

with the same number of time windows as sample points. The window size is

appropriately chosen to find the optimal balance between the advantages and

disadvantages of the time domain and frequency domain analysis.

Figure 5-6: Relationship between Different Signals

The steps in creating a spectrogram are shown in algorithm 5-5. The first

component is taking the Fast Fourier Transform (FFT), which results in a frequency

domain signal. From Nyquist criterion, the size of this frequency signal is half of the size

of the time window. The second component is taking a dot product between the

frequency signal and a Hamming window. The application of Hamming function

suppresses the Gibbs’ phenomena in spectral windowing [OSB99]. The remainder of the

DEMA algorithm proceeds in a similar way as one for time-domain analysis.

It is sometimes advantageous to have overlap between adjacent time windows.

Overlap reduces lost of frequency information of signals close to the edges of windows.

The optimal value of overlap should be found experimentally. However a good value of

overlap may be half of the window size.

36

Algorithm 5-5: Spectrogram

V

FwswsV

wFF)))*w(s*w:(s(TF

pssniibb

(T)

VwpnT

bi

bi

return :6

)12

*)1(:2

*( :5

)(HAMMING :411FFT :3

:},,0{,each for :2:}1,,0{ ,each for and }1,0{ ,each for :1

SPECGRAM

function windowspectral Hamming HAMMING() tracesEM of mspectrogra

window timea within points timeofnumber windows timeofnumber

tracesEM ofnumber tracesEM

←−+

•←−+←

∈−∈∈

======

K

K

5.4 Attack Strategy on Known Point Operation

Algorithms such as double-and-always-add [Cor99] and Montgomery ladder [Mo97]

have a consistent sequence of point operations. While this prevents simple analysis, it

enhances differential analysis as the whereabouts of point operations can be identified

and the attacker can focus on the double operations. A differential analysis on the (n+1)th

ECCDbl operation using the MSB of an input coordinate for trace splitting can reveal the

nth key bit. The attacker can use this strategy to iteratively recover the entire key.

5.5 Proposed Attack Strategy on Unknown Point Operation

Jacobi form and Hesse form elliptic curves [LS01] [JQ01] use the same algorithms for

point addition and doubling. Not only does this approach prevent simple analysis, it also

has the benefits of making differential analysis more difficult. In this approach, an

attacker can only perform differential analysis on double operations as additional

37

operations are not done for key bits of zero. Given that an attacker knows the mth

operation to be a double operation for the nth key bit, the attacker can guess that nth key

bit is one and therefore assumes (m+2)th operation is a double operation for the (n+1)th

key bit. The attacker can perform differential analysis on the (m+2)th operation to verify

the nth key bit is 1. However it is not logical to perform the differential analysis on the

(m+1)th operation, since the input is the same as the output of mth operation and it is the

same regardless of the nth key bit value.

The attacker can apply this strategy iteratively starting from the first operation

which is known to be the double operation for the first key bit, a shown in figure 5-7.

Given mth operation is ECCDBL for mth bit,

Q is the output of ECCDBL and P is the base point

If nth bit is 1If nth bit is 0

(m+1)th

operation is ECCDBL; input is Q

(m+2)th

operation is unknown; input is 2Q

(m+1)th

operation is ECCADD; input is Q

(m+2)th

operation is ECCDBL;

Input is Q+P

Figure 5-7: Attack Strategy on Unknown Point Operation

5.6 Proposed Attack Strategy on Window Method

Elliptic curve scalar multiplication can be more efficiently implemented with a window-

based method, which operates on a scalar multiplier in base 2w for some windows size w

greater than one. The window method works in a similar way as the standard scalar

multiplication, except the point operations are applied on one scalar digit di of w bits at

each iteration of the algorithm. The multiplication of a digit di by the base point is pre-

computed and stored in a lookup table.

38

The window method is vulnerable to differential analysis as an attacker can

capture side channel trace from the double operation and partition the signal traces with

the MSB of the input point. However the attack is made more difficult as the input of the

double operation depends on w key bits, and the attacker must try all 2w possible values

of key bits to find the correct bit values that produce the strongest differential signals.

0102030405060708090

0x00 0x01 0x10 0x11

Figure 5-8: Differential Signal for Different Key Bits

The figure 5-8 shows the differential signal resulting from different key bit guesses;

where w is 2 in this window method. Clearly, the key bits 0x10 leads the highest

differential signal, and these bits are most likely to be correct.

39

6 Proposed Methodology of SEMA

The goal of SEMA is to recover the secret scalar by identifying a sequence of ECC

addition and double operations. This chapter describes an innovative approach of

classifying the point operations from EM signals using AI neural networks programming

paradigm. Neural networks are used in many intelligent voice recognition and image

recognition systems. Other applications of neural networks include detection of medical

phenomena, stock market prediction, and engine management [S96]. However this is the

first known use of AI neural networks for cryptanalysis.

Neural networks are effective at solving difficult problems where there are no

algorithmic solutions or simple mathematical structures in the input data; [S96] and

recognition of complex EM signals belong this class of difficult problems. In contrast,

the template attack assumes noise can be modeled with a Gaussian model and classify the

operations using optimal signal detection and estimation theory. Thus far, this approach

is assumed to be the most optimal way to classify operations.

To evaluate the effectiveness of the neural network approach, it is being compared

to the template attack approach, and this forms the basis for using neural network

discussed in the first section of this chapter. The second section provides an overview of

neural network structures. The remainder sections describe the preprocessing, training

and classification procedures in neural network systems.

6.1 Motivation of using Neural Network

The template composes of signal and noise models of an operation, and is developed

from signals acquired over many executions of an operation. The signal component is the

average signal, and the noise component is multivariate Gaussian model of noise.

However, a template is a poor representation of an EM signal when there are large

amount of jitters (random time shifts) in the signals, as illustrated in figures below.

40

Signals from Operation A

0

0.2

0.4

0.6

0.8

1

1.2

1 2 3 4 5 6 7 8 9 10

Time

Ampl

itude Signal 1

Signal 2Average

Figure 6-1: Signal Component of Operation A Template

Signals from Operation B

0

0.2

0.4

0.6

0.8

1

1.2

1 2 3 4 5 6 7 8 9 10

Time

Am

plitu

de Signal 1Signal 2Average

Figure 6-2: Signal Component of Operation B Template

Each figure shows two signal samples of the same operation along with signal

average. It is obvious the two samples are identical except one is a time shift of another.

The signals from operation A are clearly different from those of operation B; in

particular, those from operation A have lower frequency and have durations of time when

signals are non-changing.

41

However, the signal averages or the signal components of these templates are

identical. The noise components due to jitters would not provide any useful identification

information either. Therefore, the template approach of modeling signals with jitters

would fail.

In general, a template model performs poorly whenever it encounters EM signal

samples that are not simple linear combination of fundamental EM signal and Gaussian

noise. The EM signal produced from a mobile device is not consistent at each execution

due to software runtime environment (section 7.2) and trigger signal (section 7.4). Hence

the noise can be thought of as some random non-linear transformation of the fundamental

EM signal.

It is conceivable that a very elaborate and complicated model can be developed

for the EM signal produced from additional and double operation. The construction of

such a model would also require a detailed model of the PDA run-time environment and

the trigger signal. A model of the trigger signal is relatively easy to create. However, a

model of the PDA run-time environment is extremely challenging to construct.

Therefore, there is no simple model to describe the EM signals generating from the

additional and double operations. In this thesis, the proposed methodology uses AI

neural network commonly deployed in speech and image recognition to classify addition

and double operations from EM signals.

Neural network is modeled after the biological neural system in our brains [B95].

Real brains, however, are orders of magnitude more complex than any existing artificial

neural networks. There are two main characteristics of neural networks that make them

suitable for this research problem. First of all, neural networks are capable of modeling

extremely complex and non-linear functions [R96]. Linear modeling is often used in

engineering as they have well-known optimization strategies. However, in situations

when linear approximations are not valid, linear models fail. Secondly, neural networks

are relatively easy to implement. They are capable of learning how to do their tasks

given the right training data. The algorithms and data structures of neural networks are

portable across vastly different applications.

There are two general classes of neural networks: unsupervised networks and

supervised networks. In an unsupervised network, the network adapts purely in response

42

to its inputs. Such networks can learn to pick out structure in their input. They are useful

for data clustering and reduction of dimensionality of data [B95]. For this application,

the supervised networks are more useful. A supervised network is trained with sample

input and desired result from the point operations.

The neural network can accept only a limited amount of input. However, each

acquisition of EM signal has tens of thousands of samples points with considerable

redundancy. The solution is to preprocess the data by using signal envelope and data

compression. The preprocessed data can be then be used for training and classification

with the neural network.

6.2 Neural Network Structure

A neural network consists of many simple computational units called neurons that are

organized in layers, as shown in figure 6-3 below.

Figure 6-3: Neural Network Structure [S96]

An artificial neuron is similar to its biological counterpart; it receives a number of

(scalar) inputs and produces an output. The neurons are connected in the sense the output

of one layer of neurons are fed to the input of the next layer of neurons. More precisely,

a neuron of layer i sends its output to every neuron of layer i+1; conversely, a neuron of

layer i+1 receives an output from every neuron of layer i.

43

Each connection between neurons is associated with a weight, and each neuron

has a bias value. The activation input value is formed from the weighted sum of the

inputs plus the bias. The scalar output of the neuron is a transfer function of the

activation input. This computation of a layer of neurons is shown mathematically as

follows:

)( BWPfA += where P = input vector from layer i-1 A = output vector of layer i W = connection weight matrix between layer i-1 and i B = neuron bias vector of layer i

The activation function that is used is a Tan-Sigmoid transfer function, as shown

below. [MW]

Figure 6-4: Tan-Sigmoid Transfer Function [MK]

A single neuron can accomplish very little. However, a large number of neurons

organized in a multi-layer network can have great power.

The neuron network consists of four layers of neurons, as in typical neural

networks for pattern recognition. The neurons at the first layer (input layer) receive the

preprocessed EM signals as their input. There are two hidden layers. The last layer

(output layer) has only one neuron. If the output of this neuron is a positive number, the

signal is classified as a double operation. If the output is a negative number, it is

classified as an addition operation. The optimal number of neurons for other layers are

44

found experimentally, and the procedure and outcome for this are described in section 9.

In general, a network with more neurons is better able to model more complex patterns.

However, a network with more neurons requires more training data and is also more

computationally demanding.

6.3 Preprocessing

There are basically three steps of preprocessing of the input data: taking signal envelope,

scaling of signal envelope, and compression of the scaled envelope. There are two goals

of input preprocessing. The first goal is to reduce the number of inputs, which reduce the

complexity of the neural network and the number of training data required. The second

goal is to improve the recognition quality.

Signal Envelope

-3

-2

-1

0

1

2

3

4

1 2 3 4 5 6 7 8 9 10 11 12

Time

Am

plitu

de

OriginalEnvelope

Figure 6-5: Signal Envelope

An envelope of a sample EM signal is shown in figure 6-5. Each segment in the

envelope represents the maximal value of a four point window of the original signal.

This effectively reduces the size to a quarter of its original size. Although the use of

envelope reduces the information content of the original signal, local signal fluctuation

does not contribute to recognition quality as EM voltage signal naturally fluctuates

45

between positive and negative voltage values. One may be tempted to avoid the use of

envelope and instead reduces the resolution appropriately. However, the use of envelope

is important as it provides approximate positive amplitude of a signal. Moreover, it is not

a good idea to use average value instead of maximal value as the average signal value

would be approximately zero for EM signals.

The second step is to scale the training data such that the network input are

normalize to zero means and unity standard deviation. This is shown in the example

below. Each column represents a signal, and each row of a column represents a signal

point.

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

−+++−+−+−−+

⇒

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

−++++−+−+

041.171.071.016.139.039.016.178.026.031.178.0

021120111011

The means and standard deviations of the original inputs are stored and used to

transform future inputs to the network. This is stated mathematically as follows.

Let M = mean vector of the training input S = reciprocal of standard deviation vector of the training input B = new input vector A = new input vector after scaling

TSMBA )( −=

Although this process does not reduce the data size, normalization is vital to

increasing recognition accuracy. This is partly due to the activation function used in

neuron which is sensitive to inputs between -1 and 1.

The final step compresses the input data using principal component analysis.

Principle components define a projection that encapsulates the maximum amount of

variation in a dataset and is orthogonal to previous principle component of the same

dataset. [YR01] It is useful in applications where the dimension of the input vector is

large but the components of the vectors are highly correlated. This applied to EM signal

46

vector; each signal vector contains tens of thousands of signals, however there are

considerable correlations between points in the signal vector. Principle component

analysis is used to reduce the redundancy in the input vector.

There are two steps to principle component analysis. The first step is to

orthogonalize the components of the input vector such that they become uncorrelated, as

illustrated in the transformation below.

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

−−−

⇒

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

−−−−−

38.146.065.027.0041.171.071.028.009.020.138.1

041.171.071.016.139.039.016.178.026.031.178.0

Each column still represents a signal. However, each row represents an orthogonal

component. The rows are ordered in a way so that the orthogonal components (principal

components) with the largest variation appear at the top rows. The final set eliminates

the orthogonal components that contribute less than a predefined amount of total

variation in the set. This is illustrated in the transformation below, with the orthogonal

components that contribute less than 30% eliminated. The last row has been eliminated.

This reduces the amount of input data by eliminating the components that have the least

amount of information.

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

−

−

⇒

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

−−−

041.171.071.0

28.009.020.138.1

38.146.065.027.0041.171.071.028.009.020.138.1

47

6.4 Training

The type of neural network used for this application is backpropagation network.

Backpropagation is a gradient descent algorithm, which adjusts the network weights

along the negative of the performance function, as illustrated in figure 6-6. [ANC] The

term backpropagation refers to the matter in which the gradient is computed for non-

linear multi-layer networks.

Figure 6-6: Gradient Decent Algorithm [ANC]

The training of a neural network requires a set of input vectors and a target vector.

In this application, the input vectors are the preprocessed EM signals. The target vector

comprises of ones and negative ones, representing double and addition operations

respectively.

The standard steepest decent algorithm is often too ineffective for most

applications. There are two main categories of improvements: the first category uses

heuristics techniques developed from analysis of the performance of the standard steepest

descent algorithm. One heuristic is to consider the momentum of the gradient descent;

this enables the network to respond not only to the local gradient, but also recent trends

on the error surface. This prevents the network from getting stuck in a shallow local

minimum on the error surface, and result in a locally optimal (but suboptimal) solution.

[MW] The second category uses numerical optimization techniques for neural network

48

training. The training techniques that are tested experimentally are CGB, CGP, GDX,

SCG and RP, and they are described in [MW].

The distribution of training data should reflect the distribution of actual test data.

For example, in scalar multiplication based of binary expansion of scalar, the double

operation occurs approximately twice as likely as the addition operation. Therefore, 1/3

of training data should come from addition and 2/3 should come from double operation.

The main reason for using the same distribution is to ensure the means and standard

deviation of the training data are approximately the same as testing data, which is

important for proper preprocessing. The second reason is that it is more advantageous for

the network to decide on a more probable outcome when it encounters an ambiguous

input. In this case, an ambiguous input is an input signal that does not fit with the usual

patterns of either addition or double operation, and hence difficult to classify. When the

network is trained with more data from double operation, the network is more likely to

classify the source of an ambiguous signal as double operation.

6.5 Classification

The preprocessed EM signal is fed into the input layer of the neural network. The values

are propagated through the layers of the networks until the output layer. The output layer

has only one neuron, and the range of this output is between +1 and -1. If the output is a

positive number, the source of EM signal is classified to be double operation. Otherwise,

it is classified as addition operation.

6.6 Combination of Classification Results

The classification results are numbers between +1 and -1. An adversary can greatly

enhance the classification accuracy if the adversary can execute this multiplication

algorithm with the same scalar more than once, and combine the classification results

49

over all executions by averaging. As more executions are used, the average classification

result becomes more consistent and accurate. This is shown mathematically.

X = random variable of classification result Xi = random variable of classification result on the ith iteration

nXVAR

XVARXVARXVARn

XVARXVARXVARn

XXXVARn

nXXX

VAR

n

n

n

)(

))(...)()((1

))(...)()((1

)...(1

)...

(

2

212

212

21

=

+++=

+++=

+++=

+++

The variance of the classification results varies inversely with the number of

samples. This can be demonstrated graphically in figure 6-7. The figure shows the

classification result distribution for a double operation, whose expected classification

value should be between 0 to +1. Combining results over many executions, the

distribution is concentrated closely to the mean value, and the probability of having a

result less than zero (false classification) is low. However, using the result of one

classification, the result distribution is spread over a wide range. There is a significant

probability that the result is less than zero.

50

Classification Result Dist

0

5

10

15

20

25

30

35

40

45

-0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Result

Pro

b % Many Iter

One Iter

Figure 6-7: Effect of Combination of Classification Results

6.7 Integer Optimization Model

An integer programming model can be used in conjunction with result combination to

enhance the neural network classification system. There are two pieces of information

that an adversary can exploit. First of all, an add operation is always followed by a

double operation. Secondly, the number of double operations equals to the number of

scalar bits. These can be formulated as constrains in an integer programming model.

The inputs to this programming model are the classification results, variables

between -1 and +1, for the point operations in the algorithm. The absolute value of this

variable represents certainty; hence, one is more certain that it is an add operation if the

result is -0.9 rather than -0.5. The certainty information is factored into the programming

model.

51

The variables are the binary assignment variables; this variable is 1 if an operation

is assigned as double, -1 otherwise. The objective function of the model is to maximize

the total certainty of the assignments, or stated mathematically as follows.

}{∑= yxx max where x is the assignment vector and y is the classification result vector

The GAMS model is shown in Appendix C. Intuitively, the programming model

picks out the most likely sequence of operations that is feasible for this multiplication

algorithm. The table below shows the integer optimization model may correct errors in

classification.

Table 6-1: Error Correction with Integer Optimization Model 1 2 3 4 5

Actual Op. Dbl Add Dbl Dbl Add

Class. Value +0.9 -0.7 -0.2 +0.8 +0.5

Class. Result Dbl Add Add Dbl Dbl

Int. Op. Result Dbl Add Dbl Dbl Dbl

The entries in boldface represent incorrect classifications. The neural network

system makes a wrong classification for operation 3 and 4. The integer optimization

model is able to correct the error at operation 3 since this violates the model constraint

that an addition operation cannot be followed by another addition operation, and

classifying that operation as a double operation is the most optimized solution. However,

the integer optimization model fails to correct the error on operation 5 as that error does

not violate any model constraints.

52

7 Experimental Setup and Methodology

The chapter describes the experimental setup and methodology that are generally

applicable to SEMA and DEMA against ECC computations on PDA devices. The

chapter begins by describing the target hardware and software platform used for the

experiments. It will then describe the implementation of ECC target system used for

attacks. The remainder of the chapter describes the measurement setup and oscilloscope

configurations.

7.1 Target Hardware Platform

The target hardware is a PDA having cell phone, Internet and email capabilities. This

platform is chosen as it is a widely-used in the industry and serves multiple purposes. To

protect confidentiality, the make of this PDA is omitted from this thesis. Furthermore,

the techniques and analysis illustrated in this thesis is not limited to this particular

hardware platform.

The PDA operates at 56 MHz and is powered by a 32-bit Intel 386 processor. It

contains 16 MB flash memory and 2 MB SRAM. The back of this PDA is removed to

expose the processor to EM signal probing.

7.2 Target Software Platform

The target hardware supports the Java 2, Micro Edition (J2ME) runtime environment,

which is an optimized environment designed to enable Java applications to run on small

computing devices such as PDA. The maker of this PDA provided the software

development kit (SDK) for programmers to develop third party Java applications to run

on this device.

The target ECC program is written in Java. Implementation of ECC program in

Java for this research work presents a unique challenge. The Java programming language

53

is different from other languages in that a program is both compiled and interpreted. The

program needs to be compiled to an intermediate language called Java bytecodes. During

runtime, the bytecodes are interpreted by the Java Virtual Machine (JVM) and translated

into binary instructions to be executed on the PDA’s CPU, as illustrated in figure 7-1.

Figure 7-1: Java Runtime Environment

Side channel analysis relies on the correlation of side channel signals with the

data and code being executed. However, during execution of a Java cryptographic

program, a very large portion of the computation time is spent interpreting the Java

program rather than executing the program. Therefore a large portion of the side channel

signals, either power or EM, have no correlation with the program data and code being

executed. The JVM effectively creates a very noisy environment. Furthermore, the OS

on the PDA may perform memory management, handle interrupts, and context switching

during a signal acquisition. This would introduce a tremendous amount of distortion in

EM signals that are captured. These factors make side channel analysis very difficult.

Another more subtle problem is that a Java program takes a much longer execution time,

and this requires a longer acquisition interval. With limited memory in digital

oscilloscope, this often means the use of lower sampling frequency.

7.3 ECC Program Implementation

The program is not a complete cryptography system capable of encryption, decryption,

authentication, etc. Instead only the dominant operation in ECC, scalar multiplication, is

implemented.

54

Before impTlementing the ECC program, several choices have to be made

regarding the selection of elliptic curve domain parameters. These include the choice of

underlying finite field, field representation, and elliptic curve. These choices in turn

influence the selection of algorithms of field arithmetic and elliptic curve arithmetic

[BHL00].

For implementation on handheld computing device, the use of elliptic curves

defined over prime fields, GF(p), can yield better performance as general processors

often provide optimized hardware for integer computations. However, the use of elliptic

curves over binary fields, GF(2n), require many bit-wise operations that are hard to

implement in software and perform poorly on general processors.

An elliptic curve E of Fp is specified by the coefficients pFba ∈, of its defining

equation baxxy ++= 22 . The NIST curves all have a = -3 because this yields a faster

algorithm for point doubling when using Jacobian coordinates. The number of points on

E defined over Fp is nh, where n is prime and h is co-factor. A random curve over Fp,

where p is an m-bit prime, is denoted by P-m.

A NIST-recommended ECC prime field curve p-192 [NIST] is used in this

implementation. The parameters of this curve are shown in table 7-1. There are many

advantages of using NIST curves. First of all, the NIST recommended curves are

designed to be computationally efficient on general processors. In particular, the NIST

primes have low Hamming weight in binary representations, and all the ones’ positions

occur at multiples of 32. This permits very fast modular reduction algorithms in

software. Secondly, NIST provides the parameters (a, b) for the curves that are studied

extensively and known to be secure from all common attacks. As well, it provides a

suitable reduction primitive polynomial for the underlying field that allows for efficient

implementation. Furthermore, it provides the order of curve, which is the number of

points on the curve and is very difficult to compute. The order of a curve is important for

some power analysis countermeasures and testing purposes. Finally, NIST also provides

a base point on the curve, which is also hard to find.

Of all the NIST recommended curves over prime field, the p-192 curve has the

least number of bits in the family of prime field curves, and is most suitable for resource-

constrained devices such as PDA.

55

Table 7-1: Elliptic Curve P-192 Parameters [NIST] Num of bit (m) 192

Prime order (n) __ $$$$$$$$ $$$$$$$$ $$$$$$$$ __' $"__ ___&!_&_ &_'__"__ __ $$$$$$$$ $$$$$$$$ $$$$$$$$ __' $"__ ___&!_&_ &_'__"__ 0x FFFFFFFF FFFFFFFF FFFFFFFF 99DEF836 146BC9B1 B4D22831

Co-factor (h) 1

Polynomial f(x) = x192 – x64 – 1

Parameter a -3

Parameter b 0x 64210519 E59C80E7 0FA7E9AB 72243049 FEB8DEEC C146B9B1

base point x 0x 188DA80E B03090F6 7CBF20EB 43A18800 F4FF0aFD 82FF1012

base point y 0x 07192B95 FFC8DA78 631011ED 6B24CDD5 73F977A1 1E794811

The coordinates of an elliptic curve point are elements of Fp, which are integers

between 0 and p-1. Assuming that m is the bit size of elements of Fp and each word is 32

bit long, the coordinates are represented as an array of m/32 words. The NIST primes are

chosen such as that m is always divisible by 32. The algorithms for field arithmetic used

in this implementation are given in [BHL02]. The API for the Java class for field

arithmetic is given in Appendix A.

The standard representation of points on an elliptic curve E is in affine

coordinates P = (x, y) and is given by the (affine) equation baxxy ++= 22 . However,

the use of affine coordinates requires finite field inversion, which is a very computational

expensive operation and requires 10 to 100 times more computation time than

multiplication. Therefore, it is advantageous to avoid inversions by representing points

using projective coordinates of which several types have been proposed. In standard

projective coordinates, the projective point 0),::( ≠ZZYX corresponds to affine point

),( ZYZX . The projective equation of the elliptic curve is 3232 bZaXZXZY ++= .

In Jacobian projective coordinates [CC87], the projective point 0),::( ≠ZZYX

corresponds to the affine point ),( 32 ZYZX and the projective equation of the curve is 6432 bZaXZXY ++= . Jacobian projective coordinates are used as they yield the best

overall performance. The algorithms for elliptic curve point addition and doubling are

56

given in [BHL02]. The API for the Java class for elliptic curve arithmetic is given in

Appendix B.

The base point and points in lookup table are given in affine coordinates, while all

intermediate points during scalar multiplication are represented in Jacobian coordinates.

The final result must be converted from Jacobian coordinates to affine coordinates.

The two point operations most often used are DblJJ and AddJAJ. DblJJ is a

double operation from a Jacobian input to a Jacobian output. AddJAJ is a add operation

for a Jacobian input, affine input and Jacobian output. The computational costs of

different point operations are specified in table 7-2. The target of DEMA experiment is

on the DblJJ operation in the scalar multiplication algorithm.

Table 7-2: Costs of point operations [BHL00]

Doubling General addition Mixed coordinates

2A A 1I, 2M, 2S A + A A 1I, 2M, 1S J + A J 8M, 3S

2P P 7M, 3S P + P P 12M, 2S J + C J 11M, 3S

2J J 4M, 4S J + J J 12M, 4S C + A C 8M, 3S

2C C 5M, 4S C + C C 11M, 3S

7.4 Measurement Setup and Technique

The EM emanation is received with an electric-field probe that is positioned on top of the

processor and attached to a coaxial cable. After wide-band amplification, the EM signals

are captured on a digital phosphor oscilloscope. The LED on the PDA is used as a trigger

signal. The LED is programmed to turn on at the beginning of an ECC operation of

interest, which triggers to the oscilloscope to capture one trace of EM. The measurement

setup is illustrated in figure 7-2.

57

Figure 7-2: Measurement Setup

The EM-6992 near probe set from Electro-Metrics is used for this experiment.

There are several competing choices for EM probes. There are two classes of EM

probes: magnetic field (H-field) probe and electric field (E-field) probe. The class of E-

field probes comprise of a ball probe and a stud probe. However, they completely fail to

capture EM signals from the processor.

The H-field probes are electrically small (i.e. resonant frequency above 1 GHz)

loops of varying sensitivities [EM6992]. The loops are wound within a balanced Faraday

shield that reduces their response to electric fields to a negligible factor. Each

successively large loop increases sensitivity (independent of frequency) by approximately

12 to 15 dB. Probes of lower sensitivity are better in isolating an emission source more

precisely. The magnetic probes are used in this experiment to capture EM signals

radiating from the PDA’s processor.

The H-field probe is placed directly on the processor to get the maximum amount

of EM signal, as shown in figure 7-3. The processor is a strong source of information-

dependant EM signals, and is easily accessible from the back of the PDA device. Other

locations, such as the memory module, may also produce strong EM signals. However,

since memory access is managed by the OS of the PDA, the memory access time is

unpredictable and different for each run of the ECC algorithm; hence, it is not a reliable

EM side channel.

58

Figure 7-3: EM Probe

It is very important that the distance between the probe and EM source is

minimized and consistent throughout all experiments. Even small changes in this spacing

can yield large variations in amplitude. In this experiment, the H-field probe is placed

directly on the processor, so the spacing is ensured to be always the same. In real attacks,

it may not be possible for an adversary to ensure the spacing is always the same.

However one may employ DSP techniques to remove variations in signal amplitude due

to variations in spacing.

The H-field probe that is chosen is the smallest one of the set, and it has 1 cm

loop. A smaller probe has lower sensitivity, which helps in isolating an emission source

more precisely. As well, the probe is small enough such that it fits nicely on top of the

processor which is roughly a square with length of 1 cm. A bigger probe may capture

noisy signals radiating from other devices. Although a smaller H-field probe has a lower

cut-off frequency, the cut-off frequency is much higher that what is needed in this

experiment.

The amplifier used is a broadband preamplifier (Electro-Metrics EM-6990)

connected to the EM probes. It provides a significant improvement in overall

measurement sensitivity of the typical spectrum analyzer. The preamplifier is inserted in

line between the H-probe and the digital oscilloscope.

The frequency range is between 5 kHz and 1200 MHz, with the cutoff frequency

at -3dB gain. [EM6990] The typical gain is 22 dB. In this experiment, the sampling rate

is between 10 to 50 MHz, which is well within the frequency range of the preamplifier

and H-field probe. The noise figure of the preamplifier is 6 dB.

The voltage across the terminals of LED on the PDA is used as trigger signal. In

software, the LED is programmed to turn on at the beginning of an operation of interest.

The rising edge of the LED signal would trigger the oscilloscope to begin capturing one

59

trace of EM signals. The reasons why the rising edge is used is that the rise time is much

less than the fall time of the LED signal. This allows the oscilloscope to begin capturing

EM signals as soon as possible. Just as importantly, there is less variations in the rise

time of the LED signal. This reduces the amount of jitters or random horizontal

movements of EM signals. However, there still exists a finite amount of rise time in the

LED signal, and this does causes some difficulties in EM analysis.

The EM signals are captured and stored with a digital oscilloscope. The

oscilloscope used is a TDS7254 Digital Phosphor Oscilloscopes from Tekronics [Tek].

There are many powerful features of this oscilloscope, and only those pertinent to this

experiment are described in this thesis. The configuration of the oscilloscope is described

in the following section.

7.5 Oscilloscope Configuration

The standard way of capturing signals is to acquire a single trace from one execution of

the algorithm. This may be used initially to find the duration of algorithm and

distinguishing features of EM signals from the algorithm. However, this is not useful for

DEMA or SEMA as those experiments often require several hundreds traces. Having

more traces enables a more consistent average signal and produces a more reliable result.

The number of sample points in each trace is set by the record length, and the number of

traces is set by frame count.

Moreover, the traces that are captured must be for consecutive executions of an

algorithm running within a loop. This is the only way to ensure the traces being captured

are for the correct set of controlled input value. This mode of capturing multiple

consecutive traces is called fast frame.

Each frame is captured at the rising edge of the trigger signal from the PDA’s

LED. However, due to noise, the oscilloscope occasionally fails to trigger properly from

the LED signal. It is important to check that the number of fast frames captured matches

to the number of times the algorithm is executed. Unfortunately, the oscilloscope cannot

detect over-triggering, or acquiring invalid frames due to noises in the trigger signals.

Once the oscilloscope acquired a preset number of frames, it would stop acquiring more

60

frames. One trick to overcome this difficulty is to set the frame count on the oscilloscope

higher than the number of times the algorithm is executed. Once the device has

completed execution of each iteration of the algorithm, one can stop the oscilloscope and

check the number of fast frames that are acquired.

There are two acquisition modes used in this experiment: the sample mode and

peak detect mode. The input signals are always sampled at each acquisition interval or

sampling period. In sample mode, the input signal is acquired at the beginning of each

acquisition interval. In peak detect mode, the greatest or lowest input signals is

alternatively captured at each acquisition interval. It turns out that peak detect mode is

excellent for DEMA over time domain signals, as only the peak signals are important.

However the sample mode must be used for frequency analysis in power spectrum

density and spectrogram. The theory behind discreet Fourier transform demands that the

time signals must be sampled at a regular interval. Therefore using other acquisitions

modes for frequency analysis would lead to incorrect results.

Due to the memory constrains of the oscilloscope, there is a trade off between the

number of fast frames and the number of sample points. The number of samples points in

turn depends on the frame duration and resolution. Given the frame duration is fixed to

the duration of the algorithm, an attacker can choose to have higher number of fast

frames or higher resolution. Higher number of fast frames can give more consistent

results. However, high resolutions can provide more details of the EM signals.

61

8 Experimental Results of DEMA

The focus of the DEMA experiment is to attack the ECC point double operation, and

thereby recovering the secret scalar in the multiplication algorithm. The first section

describes the experiments setup of DEMA. The second section provides results of

proposed trace splitting, and shows the MSB of input point coordinate is the best partition

bit. The next three sections show the DEMA results for time domain, power spectral

density and spectrogram signals. The final section compares the results of three signals

above.

8.1 Setup

The EM emanation is captured with 25K sample points at 12.5 MHz frequency over 2 ms

frame duration and 1300 fast frames. This is the maximum utilization of the oscilloscope

memory. The double operation takes about 18 ms, the scope is configured to extract the

EM signals from 2ms to 4ms of the double operation, where the most noticeable

differential signals are found. The use of 1300 fast frames gives very good consistency,

and the use of more fast frames does not appear to produce better experimental results.

Signals acquired with sample mode are used for frequency and spectrogram analysis.

Those from peak detect mode are used for time domain analysis.

DEMA requires partitioning the EM signals based on a partition bit value. Early

experiment applies two consecutive batches of inputs with the first batch having the bit

value of 0 and second batch having the bit value of 1. However it is found that any time

you group the outputs of the first n executions into the first set, and the outputs of the

second n executions into the second set, there is a significant group difference between

their outputs regardless of their bit values (false positive). This is because the average of

EM signals fluctuates slowly over time. This could be due to other EM sources in the

environment, or some other change in conditions within the device.

62

The second approach that also fails is to produce inputs with alternating values in

the partition bit. It is found that when one groups the odd executions (1,3,5,…) in set one

and even executions (2,4,6,..) in set two, there is also significant group difference

regardless of the bit value. The cause for such group difference is unclear. Perhaps, the

least significant bit of loop counter affects the signals being generated. The sets from

even and odd executions would have group difference regardless of the input bit value,

which lead to false positives.

The final solution is to generate the input using a pseudorandom generator and a

fixed seed. This is cumbersome as the analysis program must use the same

pseudorandom generator and seed to recover which bit value is used at each iteration of

the algorithm. However this is only way to ensure that there is no false positive in the

final result.

In DEMA one should check that the differential signal from a correct partition to be

much stronger than that of an incorrect partition. This is the only way to check the

effectiveness of DEMA. Incorrect partition is achieved by performing the analysis using

the wrong seed of the pseudorandom generator.

8.2 Results of Trace Splitting

Figure 8-5, 8-1 and 8-2 show the results of differential EM analysis with correct bit

partitioning using first, second and third most significant bit respectively. Clearly the

amplitude of differential signals diminishes significantly as the partitioning bit is further

away from the MSB. This is because the chance of carry-out is not as high if the 2nd or

3rd MSB is one, hence the probability of carry-out does not correlate as closely to bits

other than MSB. This demonstrates the impact of carry-out in sub-operations within

point doubling on the resulting EM signals, and shows that MSB is the most suitable

partition bit. Figure 8-11, 8-3 and 8-4 illustrate a similar situation with spectrogram.

They show the results of differential spectrogram analysis with correct bit partitioning

using first, second and third most significant bit respectively.

63

Figure 8-1: Differential signal for correct bit partitioning on 2nd MSB

Figure 8-2: Differential EM signal for correct partitioning on 3rd MSB

64

Figure 8-3: Differential EM spectro for correct partitioning on 2nd MSB

Figure 8-4: Differential EM spectro for correct partitioning on 3rd MSB

65

8.3 Results of Time Domain Analysis

Figure 8-5 shows the differential EM signal when a correct scalar bit is chosen in DEMA.

In contrast, figure 8-6 shows the same analysis when an incorrect scalar bit is chosen.

The 3 SD (Standard Deviation) and -3 SD are included as references in the plots. Signals

above and below the 3 SD and -3 SD curves respectively are considered to be significant.

The first figure features multiple significant peaks whereas the second figure shows no

peaks at all. The peaks are likely corresponding to the time of finite field computations

on the x-coordinate of the input point.

Figure 8-5: Differential EM signal of ECC double with correct guess

66

Figure 8-6: Differential EM signal of ECC double with incorrect guess

8.4 Results of Power Spectrum Density Analysis

As before, the 3 SD and -3 SD curves are used as references in power spectrum density

analysis. Figure 8-7 shows the differential PSD signal of ECC Double with correct bit

partitioning. Figure 8-8 shows the differential PSD signal with incorrect bit guess.

Clearly, no significant peaks are found in differential PSD signal with correct bit guess.

In fact, there are little discernable differences between figure 8-7 and 8-8.

However, once the PSD from EM signals between 0.6 ms and 1.2 ms is focued, a

significant differential peak at 4 MHz is found. Figure 8-9 and 8-10 show the differential

PSD signals for correct and incorrect bit partitioning respectively. This shows the PSD

differential signals feature peaks, but the peaks are evened out in figure 8-7 with a large

capture interval.

67

Figure 8-7: Differential EM PSD of ECC double with correct guess

Figure 8-8: Differential EM PSD of ECC double with incorrect guess

68

Figure 8-9: Differential PSD of ECC double with correct guess

Figure 8-10: Differential PSD of ECC double with incorrect guess

69

8.5 Results of Spectrogram Analysis

Figure 8-11 is the differential EM spectrogram for ECC double operation with a correct

scalar bit guess, whereas figure 8-12 is for an incorrect scalar bit guess. The 3 SD and -3

SD curves are again included in the plots to help distinguish peaks that are significant.

The SD curves always peak at 0 frequency, indicating that there are considerable

fluctuations of average EM signals over different traces. Clearly, figure 8-11 features

many significant peaks above the 3 SD curve and below the -3 SD curve. Furthermore,

peaks in figure 8-11 correlate with those in figure 8-5, such as those that appear at 0.7ms

and 1.1ms. This is expected as the differential EM signal and differential EM

spectrogram are simply two different perspectives of looking at the same events

unfolding on the PDA device.

Figures 8-13 and 8-14 show the frequency domain signals of a single time frame

at 0.6 ms for correct and incorrect scalar bit guess respectively. With correct scalar bit

guess, significant amount of signals are beyond the -3 SD curves.

Figure 8-11: Differential EM spectro of ECC double with correct guess

70

Figure 8-12: Diff EM spectro of ECC double with incorrect guess

Figure 8-13: A frame of differential EM spectro with correct guess

71

Figure 8-14: A frame of differential EM spectro with incorrect guess

8.6 Comparisons

Table 8-1 shows the greatest multiple of SD_DOM that is less than a given percentile of

peaks for each type of analysis. For example, column two lists the multiples for time

domain analysis with correct partitioning. The second row lists the multiples of

SD_DOM less than 10 percentile of peaks (10% greatest sample points). The last row

shows the ratio of the highest peak to SD_DOM.

Table 8-1: Greatest Multiples of SD_DOM below Peak Amplitude Percentile of Peaks

Time (correct)

Time (incorrect)

Frequency (correct)

Frequency (incorrect)

Spectro (correct)

Spectro (incorrect)

10 2.99 1.69 1.66 1.64 5.10 1.49 1 4.83 2.65 2.61 2.59 6.28 2.28 0.1 5.86 3.29 3.30 3.31 7.04 2.81 Highest 7.78 3.99 4.18 4.37 7.73 3.07

72

In time domain and spectrogram analysis, the multiples from correct partitioning

are significantly higher than those from incorrect partitioning. This indicates that

differential analysis was successful in time domain and spectrogram signals, as an

attacker can distinguish between correct and incorrect bit guesses.

In PSD, however, the multiples from correct partitioning are not significantly

higher than those from incorrect partitioning. This indicates that differential analysis is

difficult with PSD. If a particular time frame within the double operation is focused, one

can get much better differential signals with correct partitioning. In general, PSD is not

good for differential analysis as any local correlation between EM signal and data is

evened out when PSD is performed over a long interval of time.

Although both time domain and spectrogram analysis are good for differential

analysis, the spectrogram analysis is clearly better. The greatest multiples are

significantly higher for spectrogram than time domain analysis; this indicates there are

more distinguishable peaks in spectrogram signals with correct partitioning, and

differential analysis is easier with spectrogram analysis. For example, in time domain

analysis, the top 10 percentile of peaks are three times above the SD_DOM. However, in

spectrogram analysis, the top 10 percentile of peaks are five times above the reference.

Therefore spectrogram analysis is most useful for DEMA.

73

9 Experimental Results of SEMA

This chapter presents results that show the usefulness of neural network programming in

SEMA attacks by comparisons with template attacks. The attack target is the ECC scalar

multiplication algorithm based on binary expansion of scalar. The chapter presents

results on optimization of neural network parameters, optimization of template attack

parameters, point operation recognition accuracy, effect of integer optimization models,

and effect of using multiple traces. The Matlab code of the neural network for SEMA is

given in Appendix D.

9.1 Setup

The EM emanation is captured with 50K sample points at 2.5 MHz frequency over 20

ms. The double operation takes about 18 ms, and the addition operation takes about 32

ms. However, analyzing the first 20 ms of these operations is found to be sufficient to

distinguish these signals. The target algorithm is a 192-bit scalar multiplication, which

performs 192 double operations and up to 192 addition operations. At maximum

utilization of oscilloscope memory, it can acquire 656 fast frames. At each acquisition,

the scope is configured to acquire 200 frames from double operations, 100 frames from

addition operations, and undetermined number of frames from a scalar multiplication

using a random scalar. The first 300 frames are used as training data, and frames from

scalar multiplication are used for testing data.

In the experiment, training and testing data from three acquisitions are used. In this

way, 900 training data frames and testing data from three scalar multiplications are

obtained. This is needed to ensure the experimental results are consistent.

The figures below (Figure 9-1 to 9-4) show EM signals generated from double and

additional operations, along with their average EM signal. Clearly, the EM signals are

quite different even for the same operation. This is evident from the fact the average

signal over 150 executions is reduced significantly in amplitude. As well, since there are

74

so much difference between signals from the same operation, it is not easy to distinguish

signals between addition and double operations. However, distinguishing the operations

from the average signals is even more difficult. For this reason, even if an attacker can

fix the scalar used in scalar multiplication, it is not useful to perform SEMA from average

EM signals. Most of the distinguishing features are lost in the average EM signals.

Figure 9-1: EM Signal from ECC Double Operation #1

75

Figure 9-2: EM Signal from ECC Double Operation #2

Figure 9-3: EM Signal from ECC Addition Operation #1

76

Figure 9-4: EM Signal from ECC Addition Operation #2

9.2 Parameters of Neural Network

There are number of parameters whose appropriate values must be found experimentally.

These parameters values are for the preprocessing unit and neural network. Those for

preprocessing include envelope size and window size. Those for neural network include

the number of neurons in each layer of the network.

The goal is not to find the mathematical optimal values for those parameters, but

only to find reasonable good parameter values. The strategy is to use past research and

heuristics to choose a range of reasonable values for each parameter, try a subset of

values within the range to find the resulting classification error rates, and select the

parameter value that yields the least error rate. This process optimizes each parameter

independently, without considering the interdependency between parameter values.

77

9.3 Results of Neural Network using Time Domain signals

There are two parameter values for the preprocessing unit for a neural network using time

domain signals: envelope and minimum variance in principal analysis. A signal envelope

reduces the signal size and filters out regular fluctuations in the EM signals that do not

contribute to classification accuracy. However, if the signal envelope is too coarse, it

removes the important features from the spectrogram signal. The envelope size should be

divisible by the window size. The optimal envelope size is found experimentally to be

250 points, as shown in figure 9-5.

Accuracy vs. Envelope Size

0.00%

2.00%

4.00%

6.00%

8.00%

10.00%

12.00%

0 100 200 300 400 500 600

Envelope

Err

or

Figure 9-5: Plot of Accuracy vs. Envelope Size

The preprocessor performs PCA (Principal component analysis) and retains the

components that contribute more than a specified fraction of the total variation in the data

set. The optimal value of this fraction found to be 0.4%, shown in the figure below.

78

Accuracy vs. Min Fraction

0.00%

1.00%

2.00%

3.00%

4.00%

5.00%

6.00%

7.00%

8.00%

9.00%

0.00% 0.20% 0.40% 0.60% 0.80% 1.00% 1.20%

Min. Fraction

Erro

r

Figure 9-6: Plot of Accuracy vs. Min Fraction

The size of the neural network is the number of neurons on each layer. The last layer (or

output layer) has only one neuron, and its value give the final verdict on the classification

result. From past research, the number of neuron at each layer is in the order of the input

size; therefore the heuristic used is to try numbers that are multiples of the input size.

Furthermore, the same size is used for both hidden layers (layer 2 and layer 3).

The optimal number of neurons at the input layer and hidden layer should be

interdependent. Therefore, it is not possible to optimize these values by considering them

separately. The heuristic below is used to optimize the number of neurons in the input

layer and hidden layer.

Algorithm 9-1: Optimization of the number of neurons Let a1 = current number of neurons at input layer Let b1 = current number of neurons at hidden layer Let a0 = previous number of neurons at input layer Let b0 = previous number of neurons at hidden layer a1 := 1; b1 := 1; a0 := 0; b0 := 0 while ( a0 <> a1 or b0 <> b1) a0 := a1 Choose a1 using b1 with the least error b0 := b1 Choose b1 using a1 with the least error end while

79

The idea is to iteratively optimize the number of neurons in the input layer and hidden

layer using the previous found optimal values of hidden layer and input layer

respectively. This process continues until the optimization process does not change the

number of neurons in the input and hidden layers. The optimal size of input layer and

hidden layers are found to be 3 and 1 respectively. The effect of using other sizes in

input and hidden layers are shown in figures below.

Accuracy vs. Input Layer Size

0.00%

0.50%

1.00%

1.50%

2.00%

2.50%

3.00%

3.50%

0 1 2 3 4 5

Multiple of Input Size

Err

or

Figure 9-7: Plot of Accuracy vs. Input Layer Size

80

Accuracy vs. Hidden Layer Size

0.00%

0.50%

1.00%

1.50%

2.00%

2.50%

3.00%

3.50%

0 1 2 3 4 5


Err

or

Figure 9-8: Plot of Accuracy vs. Hidden Layer Size

There are a number of different algorithms for training a neural network. They

differ in the computational time. They also appear to produce different classification

error rate even though they are trained to produce MSE (Mean Square Error) of less than

10-10 in the training data, as shown in table 9-1. This shows that two sets of weights that

are adjusted to give the same amount of MSE in the training data can perform differently

with new test data. For instance, the network trained RP algorithm has significantly

higher classification error rates. The error rates of networks trained by other algorithms

are roughly comparable. The CGB algorithm appears to be slightly more superior, with

the least error rate and computational time. Computational time is not an important

concern in a SEMA attack. Computational time is more important in real-time

applications. The CPU time is relative to the number of clock ticks required in the CPU.

81

Table 9-1: CPU time and Error % for Different NN Training Algorithms Algorithm CGB CGP GDX SCG RP

CPU Time 8.09 8.40 280 11.7 2.98

Error % 2.64 2.77 2.72 2.83 4.11

The table below summarizes the best parameter values for a neural network system using

time domain signals.

Table 9-2: Optimal parameter values of a NN using Time Domain Signals Parameter Envelop

Size Min. Fraction

Input Layer Hidden Layer

Training Algorithm

Value 250 0.4% 3x 1x CGB

The resulting classification error rate is 2.64%. The classification error rate on training

data is 0%. The error rate is calculated from classification on 9930 point operations.

9.4 Results of Neural Network using Spectrogram Signals

There are four parameter values for the preprocessing unit for a neural network using

spectrogram signals: window size, overlap size, envelope, and minimum variance in

principal analysis. The parameter value for the neural network design includes the

number of neurons at each layer and the algorithm used to train the neural network.

Spectrogram performs frequency analysis on windows of signals, and the window

size is the number of time sample points in each window. A larger window would

capture higher frequency resolution, but lower time resolution. On the other hand, a

smaller window would capture lower frequency resolution, but higher time resolution.

For instance, if the EM signal pattern changes rapidly, the use of smaller window is

needed to capture the EM signal at a shorter time intervals. On the other hand, if the

information is contained in a large frequency range of the EM signals, the use of larger

window is needed to capture a wide-band of EM signals. The window size should be

82

divisible by the total number of samples. The optimal window size is found

experimentally to be 2500 points, as shown in figure 9-9.

Accuracy vs. Window Size

0.00%

0.50%

1.00%

1.50%

2.00%

2.50%

3.00%

0 1000 2000 3000 4000 5000 6000

Window Size

Erro

r

Figure 9-9: Plot of Accuracy vs. Window Size

The spectrogram windows should overlap so that signal patterns near the edge of the

windows would not be lost. The extent of the overlap depends on the input signals and

can be found experimentally. With the window size fixed at 2500 sample points, the

optimal overlap is 1000 points, as shown in figure 9-10.

83

Accuracy vs. Overlap Size

2.05%

2.10%

2.15%

2.20%

2.25%

2.30%

0 500 1000 1500 2000 2500

Overlap Size

Erro

r

Figure 9-10: Plot of Accuracy vs. Overlap Size

A signal envelope plays an important role in improving the input to the neural network.

The envelope size should be divisible by the window size. The most appropriate

envelope size is found experimentally to be 50 points, as shown in figure 9-11.

Accuracy vs. Envelope Size

0.00%0.50%1.00%1.50%2.00%2.50%3.00%3.50%4.00%

0 50 100 150 200

Envelope Size

Erro

r

Figure 9-11: Plot of Accuracy vs. Envelope Size

The preprocessor performs PCA and retains the components that contribute more than a

specified fraction of the total variation in the data set. The optimal value of this fraction

84

is found to be 0.4%, which is same as the result found for time domain signals. The error

rates for different fractional values are shown in figure 9-12.

Accuracy vs. Min Fraction

0.00%

0.50%

1.00%

1.50%

2.00%

2.50%

0.00% 0.20% 0.40% 0.60% 0.80% 1.00%

Min. Fraction

Erro

r

Figure 9-12: Plot of Accuracy vs. Min. Fraction

The size of the neural network is the number of neurons on each layer. Using the same

heuristic, the optimal numbers of neurons at the input and hidden layers are 3 and 1

respectively. These numbers are the same as those found for time domain signals. The

effect of changing the size of input and hidden layers are shown in the figures below.

85

Accuracy vs. Input Layer Size

1.88%

1.90%

1.92%

1.94%

1.96%

1.98%

2.00%

2.02%

2.04%

2.06%

0 1 2 3 4 5


Erro

r

Figure 9-13: Plot of Accuracy vs. Input Layer Size

Accuracy vs. Hidden Layer Size

0.00%

0.50%

1.00%

1.50%

2.00%

2.50%

3.00%

0 0.5 1 1.5 2 2.5 3 3.5


Erro

r

Figure 9-14: Plot of Accuracy vs. Hidden Layer Size

There are a number of different algorithms for training a neural network. Their results

are shown in the table below. Beside the RP training algorithm, all the other training

algorithms are able to train a neural network that achieves an error rate of about 2%. The

86

SCG algorithm appears to be superior, as it provides the least error rate and

computational time.

Table 9-3: CPU time and Error % for Different NN Training Algorithms Algorithm CGB CGP GDX SCG RP

CPU Time 14.9 6.98 45.8 14.2 5.45

Error % 1.89 2.00 1.98 1.87 3.89

The table below summarizes the best parameter values for a neural network system using

spectrogram signals.

Table 9-4: Optimal parameter values of a NN using Spectrogram Signals Parameter Window

Size Overlap Size

EnvelopeSize

Min. Fraction

Input Layer

Hidden Layer

Training Algorithm

Value 2500 1000 50 0.4% 3x 1x SCG

The least error rate with the parameters above is 1.87%. The error rate when the neural

network is tested with training data is 0%. The error rate is calculated from classification

on 9930 operations.

9.5 Results of Template Attack

As in neural network, training data are needed to create template for the different point

operations: additional and double operations. The training and testing data also need to

be preprocessed to reduce their dimension and remove the unnecessary signals that do not

contribute to classification accuracy.

The preprocessing unit consists of two steps. The first step is create an envelope

of the input signal, and the second step is to select points that have the greatest

differences in means between the two operations.

After training data is enveloped, they are grouped into two sets for addition and

double operations. The standard deviation of difference of means (SD_DOM) is found

for the two sets of signals. The signal points in SD_DOM that have the highest amplitude

87

correspond to sample points with the largest variations between different operations.

These signal points are useful for classification, and they are selected to form the

observation matrices for the two operations. Each column of the observation matrix

represents a sample of the experiment; and each row represents a sampling point.

A template comprises of a signal component and noise component. The average

of an observation matrix forms the signal component of a template. A noise vector is

calculated as the difference between a column of the observation matrix and the average

signal. The noise covariance matrix is computed from the noise matrix, and that is the

noise component of the template.

Using the template, one can calculate the probability of observing a noise vector.

Given a signal from an unknown operation, one can make a classification decision by

finding the probability of observing the noise vector assuming the signal is from addition

or double operations. For example, if the probability of observing the given signal from

addition operation is higher, the source is classified as addition operation.

Figure 9-15 shows how the classification accuracy varies with the size of the

observation column vector, for different envelope size. The bigger observation vector

stores more sample points from the input signal. Apparently, the error rate does not

necessarily decrease with bigger observation vector. An excessively large vector would

introduce too much useless information to the template, which may cause more

classification errors. The best configuration is using an envelope size of 500 points and

70 rows in the observation column vector. This configuration gives an error rate of

18.8%.

88

Accuracy vs. Obs Size

0.00%

5.00%

10.00%

15.00%

20.00%

25.00%

30.00%

35.00%

40.00%

45.00%

10 20 30 40 50 60 70 80

Obs Size

Erro

r 200250500

Figure 9-15: Plot of Accuracy vs. Observation Size

A sanity check of these classification systems is to check the performance from

classifying training data (data that the system has seen before). The classification error

on training data is expected to be much more accurate. A system that has good

classification accuracy on training data is said to have good memory.

The quality of memory is clearly proportional to the size of the observation

column vector, as shown in figure 9-16. In fact, an observation column vector with 100

entries or more can have perfect or near perfect memory. However, a vector of this size

usually performs poorly with new testing data. This shows a classification system with

better memory is not necessarily better at classifying new data.

89

Memory vs. Obs Size

0.00%

5.00%

10.00%

15.00%

20.00%

25.00%

30.00%

35.00%

40.00%

10 20 30 40 50

Obs Size

Erro

r

2002505001000

Figure 9-16: Plot of Memory vs. Observation Size

9.6 Results of Averaging and Integer Optimization Model

The classification accuracy can increase if the classification results over many executions

are combined by averaging. As more executions are used, the average classification

result becomes more consistent and accurate.

The classification error rates using different classification systems in conjunction

with averaging and integer optimization technique are shown in the table below.

Table 9-5: % Error rate of different algorithms Template Attack Neural Network

with Time Domain Neural Network with Spectrogram

# of Execution

w/o IP w/ IP w/o IP w/ IP w/o IP w/ IP 1 18.8 7.31 2.64 1.02 1.87 0.76 2 15.6 6.88 0.86 0.65 0.61 0.41 3 13.9 3.13 0.71 0.25 0.52 0.32

90

Figure 9-17 shows the classification accuracy using a neural network on

spectrogram signals, with respect to the number of executions and whether integer

optimization is used for error correction.

A few trends can be observed from the experimental data. First, the error rate

decreases as results from more executions are used. However, the rate of this decrease

slows down with more executions. Secondly, the classification accuracy is significantly

better with an integer programming model. This indicates the integer programming

model is able to correct some classification errors. However, the effectiveness of the

corrections also diminishes as classification approaches perfect accuracy.

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

Erro

r %

1 2 3

Number of Executions

Accuracy vs. Number of Executions

w/o IPw/ IP

Figure 9-17: Plot of Accuracy vs. Number of Executions

The final goal of SEMA is to completely recover a scalar. For a 196 bit scalar, the

probability of total scalar recovery for the neural network system using spectrogram

signals is shown in figure below. Although the neural network system using spectrogram

can achieve a very low bit classification error rate of 1.87%, the success rate of total

recovery of the scalar is only about 2%. However with 3 executions, the success rate

increases to about 36%. Figure 9-18 also shows the positive effect of using integer

optimization model for error correction. For the one execution case, the use of an IP

91

model increase accuracy from 2% to 22%. In the three execution case, the use of an IP

model increase accuracy from 36% to 53%.

0%

10%

20%

30%

40%

50%

60%

Succ

ess

%

1 2 3

Number of Executions

Accuracy vs. Number of Executions

w/o IPw/ IP

Figure 9-18: Plot of Accuracy vs. Number of Executions

9.7 Comparisons

In this section, the experimental results of template and neural network attacks are

examined. It is found that the approach from signal detection theory as in template attack

to classify point operation is ineffective. The best error rate using this approach is only

about 18.8%. However, using a neural network system can achieve an error rate of

2.64%. In addition, the use of spectrogram signals can further decrease this error rate to

1.87%.

As impressive as this may be, a bit classification error rate of 1.87% is still very

high for a complete recovery of a scalar with 196 bits or more. Two other approaches are

to combine classification results over many executions or use an integer optimization

approach for error corrections. These approaches are found to be very effective at

increasing the overall success rate of SEMA.

92

10 Discussion and Conclusions

10.1 Limitation of Research and Implementation

Due the memory constrains of the digital oscilloscope, a tradeoff must be made between

the number of fast frames and the number of sample points. The use of more fast frames

would yield more consistent result for DEMA and more training data for SEMA. The use

of more sample points can yield greater sampling frequency and better results for both

DEMA and SEMA.

In the experiments, approximately 1000 acquisitions were used to ensure

acceptable consistency in DEMA and sufficient amount of training data in SEMA. The

sampling frequency is set to the minimum frequency that provides sufficient details for

SEMA and DEMA. This often requires using frame duration shorter than the duration of

the operation, due to the limited amount of sample points. For example, in DEMA of the

double operation, the capture duration is only 2ms whereas the entire double operation is

20 ms. Only results from the section with the highest differential signal were shown.

It is possible to merge the results from more than one experiment, for example the

data for SEMA consists of acquisitions from three experiments. However the maximum

number of combined acquisitions is still constrained by the memory available on the

computer that processes the acquisitions.

A valid criticism of the SEMA and DEMA experiments is that they focused solely

on the basic ECC scalar multiplication algorithm based on binary expansion of the input

scalar, and computed over prime field GF(p). Most optimized implementations use the

window algorithm for scalar multiplication. However, the methodology proposed for

SEMA and DEMA here also work for a sliding window algorithm, although no

experimental data are shown here.

All experiments are performed on our own implementations of ECC

computations. The target PDA only supports Java third party applications, and

consequently the ECC computations are implemented in Java. As well, a trigger signal is

generated before each ECC computation. This makes the EM analysis attacks much

93

simpler. In a complete attack, the attacker needs to identify the start point of the point

operations to be analyzed.

Finally this research is not focused on the neural network programming paradigm

in artificial intelligence. Limited research effort is devoted in improving the algorithms

for neural network; instead, the most typical neural network algorithms and

configurations are used in the SEMA experiments.

10.2 Summary

The purpose of this thesis is to describe new methodologies of SEMA and DEMA against

handheld computing devices. Two main contributions in this research are the novel use

of spectrogram analysis and neural network programming for EM analysis attacks. There

are no previous uses of spectrogram analysis and neural network programming for EM

analysis attacks. As well, there is no previous experimental work on EM analysis against

ECC computations on PDA implementations.

EM analysis is particularly devastating for handheld devices as they are more

likely to be exposed to adversaries and EM signals may be easily captured due to their

device characteristics. In particular, the analysis focuses on the EM analysis on the scalar

multiplication algorithm, which is the dominant operation in an ECC cryptographic

system.

DEMA targets the ECC point double operation as it is performed at every

iteration regardless of the scalar value. DEMA recovers the secret key by statistical

analysis of the EM traces over many runs of the scalar multiplication operation. It is

discovered that the most optimal way of trace splitting is by partitioning on the MSB of a

point coordinate for prime fields. This is because the MSB value has the greatest

correlation with whether an operation on that value would result a carry-out, which would

trigger a series of different computations producing differential signals in EM side

channel. The use of other partition bits yields significantly lower differential signal.

A new quantitative way to judge whether a differential signal is significant was

proposed, which measures the percentile of peaks at different multiples of SD_DOM.

94

A new type of DEMA that analyzes the EM signals using power spectrum density

and spectrogram was proposed. Spectrogram signals are found to be better for

differential analysis. However power spectrum density signals are not effective for

differential analysis as any differential signal at a small time interval would get smudged

out over the entire interval of the frequency analysis. However, spectrogram does not

suffer from this problem because it is a type of time-dependant frequency analysis. As

well, spectrogram is superior than time domain analysis because it is less vulnerable to

jitters in the original EM signals.

Finally the previous proposed attack methodology can be extrapolated to different

attack scenarios where the timings of point operations are known, the timings are not

known, or when a window multiplication method is used.

A new innovative technique of SEMA using an artificial intelligence

programming paradigm known as neural networks is proposed to distinguish between

point addition and double operations. The neural network must be presented with

training data so that it can be trained to recognize the point operations. In classification,

the neural network returns a value between +1 and -1. If the result is positive, the

operation is classified as point doubling; otherwise it is classified as point doubling. All

training and testing data must be preprocessed to reduce the dimensionality of the input

data. The parameters for the preprocessing and neural network units must be found by

experiments and use of heuristics.

As for reference, the accuracy of the new technique is compared against a

technique based the template attacks using optimal signal detection theory. There are

many similarities between the neural network and template attack strategies; one can

consider the weights and parameters of a neural network as a template. Experimentally a

SEMA based on signal detection theory is found to be much less effective as the

underlying algorithm assumes a received signal is a linear combination of Gaussian noise

and the underlying signal. In reality, the noise in the ambient environment is not

Gaussian. More importantly, the underlying EM signal is being distorted non-linearly by

events that occur on the run-time environment of the computing device. It is found that

the use of spectrogram signal for classification can slightly improve the classification

accuracy with the neural network system.

95

There are other techniques that can further improve the classification accuracy.

Taking an average of classification results over many executions can drastically improve

the accuracy. In practice, the attacker is expected to be able to obtain EM signals over

multiple executions for SEMA. Furthermore, the use of an ILP model is effective for

error correction which leads to greater classification accuracy.

10.3 Countermeasures

There are three common approaches to resist simple analysis (i.e. SEMA):

indistinguishable formulas for point operations [LS01] [JQ01], identical operation

sequence independent of key bits [Cor99] [Mo87], and random addition-subtraction chain

[OA01]. These approaches are described in chapter 4.

The use of indistinguishable formulas appears to be the most superior. The use of

identical operation sequence has considerable overhead because it performs point

addition and doubling at every iteration regardless of the scalar bit values. The random

addition-subtraction chain is vulnerable in light of the new accurate classification

algorithm based on neural network. With an accurate classification system and the

hidden Markov model [KW03], the scalar bits may be recovered.

Scalar multiplication algorithms that are secured against simple analysis may be

vulnerable to differential analysis (i.e. DEMA). There are two common approaches to

resisting differential analysis: randomizing the base point P and randomizing the scalar k

in the scalar multiplication. Both of these approaches can effectively counter differential

analysis.

One shortcoming of these conventional countermeasures is that they all use

randomization. However, implementation of a true randomizer is very difficult and

costly. Mostly implementations use pseudo-random generators and derive initial values

from non-random sources such as time of day or mouse movements. These pseudo-

random generators are prone to attacks. A superior type of countermeasures is one that

does not require the use of any random values.

96

In the differential analysis experiments, the MSB of the input point is always

chosen as the partition bit. When the MSB is one, there is a much greater chance than a

carry-out would occur in the finite field calculations with the input point. In fact, this bit

is the only partition bit that works for differential analysis. A possible countermeasure is

to use the same algorithm for finite field computations regardless of whether a carry-out

occurs.

10.4 Future Work

Currently all point operations activate the trigger signal to indicate the timing of point

operations to the digital oscilloscope. Future research work should be devoted to

performing SEMA and DEMA without this trigger signal, as an adversary would not

have a trigger signal available in a complete attack. This may be a particularly

challenging research problem for SEMA as it requires precise timing information of the

point operations. This is perhaps similar to the segmentation problem in speech analysis

where the system needs to identify the timings of the words spoken in a conversation.

Further research can be devoted to apply the techniques in this thesis to more

practical ECC scalar multiplication algorithms such as the window scalar multiplication.

It may also be worthwhile to attempt an attack on an algorithm that is resistant to SEMA

and DEMA to see how well they resist to the new attack methodologies proposed here.

There is much similarly between template attacks and neural network

classification system. It should be possible to modify the neural network system to

perform DEMA, as is done with template attacks. The pruning techniques in template

attacks can be applied directly with the neural network system.

There are a lot of details of the neural network that can be further investigated. A

typical back-propagation network is used in this experiment. However other networks

such as radix network are also possible and are used in other recognition systems [MW].

As well, the preprocessing that is used employs a very common strategy for minimize

data redundancy. Perhaps a better technique tailored toward EM signals can be

developed for preprocessing.

97

As well, classification systems for speech and images use many different

techniques beside neural networks. Some speech analysis systems use techniques based

of HMM (Hidden Markov Model). This technique may prove to be also useful for EM

analysis attacks.

The techniques developed here can be applied to other side channel sources, such

as power. It would be of interest to see how effective these techniques are on other side

channel sources.

98

Appendix A –Java API for 192-bit Prime Field (pf192)

public pf192(long[] seg) Construct a pf192 object with a 192-bit element represented as an array of six long integers public pf192(String bitString) Construct a pf192 object with a 192-bit element represented as hexadecimal string public pf192 add(pf192 a) Return the sum of this pf192 element and the given element public static pf192 add(pf192 a, pf192) Return the sum of the pf192 elements public pf192 sub(pf192 a) Return the result of this pf192 element minus the given element public static pf192 sub(pf192 a, pf192 b) Return the result of pf192 element a minus the element b public pf192 mul(pf192 a) Return the product of this pf192 element and the given element public static pf192 mul(pf192 a, pf192 b) Return the product of pf192 element a and element b public pf192 square() Return the square of this pf192 element public pf192 inv() Return the inverse of this pf192 element public pf192 lshift(int shift) Left-shift this pf192 element by a given number of bits. This is used for scalar multiplication. public byte compareTo(pf192 element) Compare this pf192 element with a given element. Return 0 if they are equal, -1 if they are additive inverses, and +1 otherwise public boolean equalTo(pf192 element) Return true if this pf192 element equals to the given element; false otherwise

99

Appendix B – Java API for 192-bit ECC (ECC_P192)

public ECC_P192() Construct a point at infinity public ECC_P192(pf192 x, pf192 y) Construct an affine point with x and y coordinates public ECC_P192(pf192 x, pf192 y, pf192 z) Construct a Jacobian point with x, y and z coordinates public ECC_P192(pf192 x, pf192 y, pf192 z, pf192 z2, pf192 z3) Construct a Chudnovsky Jacobian point with x, y, z, z2 and z3 coordinates public ECC_P192(ECC_P192 pt) Copy constructor public ECC_192 dblAC() Return the double of this affine point as a Chud-Jacobian point public ECC_192 dblAJ() Return the double of this affine point as a Jacobian point public ECC_192 dblJJ() Return the double of this Jacobian point as a Jacobian point public ECC_192 addCCC(ECC_192 pt) Return the sum of this point and a given point. All are in Chud-Jacobian coordinates public ECC_192 addJAJ(ECC_192 pt) Return the sum of this Jacobian point and a given affine point, as a Jacobian point public ECC_192 addJCJ(ECC_192 pt) Return the sum of this Jacobian point and a given Chud-Jacobian point, as a Jacobian point public void toAffine() Convert this point to affine coordinate public String toString() Return the hexadecimal values of the point coordinates. public int compareTo(ECC_P192 pt) Compare this point to a given point. Return 0 if they are equaled, -1 if they are additive inverses, +1 otherwise

100

public ECC_P192 binary(int[] scalar) Return the scalar multiple of this element by the given scalar using the multiplication algorithm based binary expansion of scalar bits public ECC_P192 slidewin(int[] scalar) Return the scalar multiple of this element by the given scalar using the sliding window algorithm

101

Appendix C – GAMS Model for Integer Optimization SETS T time slots /1*331/; ALIAS (T,T1),(T,T2); PARAMETERS P(T) probability value /1 0.318248 2 0.988648 …/ C(T) check value /1 1 2 1 …/; SCALAR D number of double ops /191/; VARIABLES X(T) assigns this time slot for dbl op Z maximize total probability ERR bit error rate; BINARY VARIABLE X; EQUATIONS PROB define objective function NUMDBL restrict the number of dbl op NO2ADJ(T1,T2) no two adjacent add ops allowed BERR calculate error rate; PROB .. Z =E= SUM(T, P(T)*X(T)); NUMDBL .. SUM(T, X(T)) =E= D; NO2ADJ(T1,T2)$(ord(T1) eq (ord(T2)-1)) .. X(T1) + X(T2) =G= 1; BERR .. ERR =E= (card(T)-SUM(T, (2*X(T)-1)*C(T)))/2; OPTION SUBSYSTEMS; model ready /all/; solve ready using MIP maximizing Z;

102

Appendix D – Matlab Code for Neural Networks % AI SPA attack % General Parameters record_len = param(1); period = param(2)*1000; %ms sample_rate = 1/period; %kHz record_count = param(6); % Pre-processing Parameters win_size = 2500; step_size = 1500; env_size = 50; % Load training and testing data load('../mult/init0.mat', '-mat'); feature0 = SpecPreProcess(em_dat, win_size, step_size, env_size); clear em_dat; load('../mult/init1.mat', '-mat'); feature1 = SpecPreProcess(em_dat, win_size, step_size, env_size); clear em_dat; load('../mult/init2.mat', '-mat'); feature2 = SpecPreProcess(em_dat, win_size, step_size, env_size); clear em_dat; % Format training data train_result = repmat([ones(1,150) -1*ones(1,150)], 1, 3); train_feature1 = [feature0(:,1:300) feature1(:,1:300) feature2(:,1:300)]; train_feature = [train_feature1 train_feature1 feature0(:,1:300)]; % Preprocess training data [pn,meanp,stdp] = prestd(train_feature); [ptrans,transMat] = prepca(pn,0.004); [i, trainSize] = size(train_feature1); ptrans = ptrans(:,1:trainSize); train_result = train_result(:,1:trainSize); % Format testing data test_feature = [feature0(:,301:631) feature1(:,301:631) feature2(:,301:631)]; test_result = [1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1; 1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1; 1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1; 1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;

103

1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;-1;1;1;1;-1;1;-1; 1;1;1;-1;1;-1;1;-1;1;1;-1;1;-1;1;-1;1;-1;1;1;-1;1;-1;1;-1;1;-1;1;-1;1;1;1;1;1;1;-1;1;-1;1;1;-1; 1;-1;1;1;1;1;1;-1;1;1;-1;1;1;1;1;-1;1;-1;1;1;-1;1;1;-1;1;-1;1;-1;1;-1;1;1;1;-1;1;1;1;-1;1;-1; 1;1;-1;1;-1;1;1;1;1;-1;1;-1;1;1;-1;1;-1;1;1;-1;1;1;1;-1;1;-1;1;1;-1;1;1;1;-1;1;1;1;1;-1;1;1; -1;1;1;1;1;1;1;-1;1;-1;1;1;1;1]; % Preprocessing training data [p2n] = trastd(test_feature,meanp,stdp); [p2trans] = trapca(p2n,transMat); [i testSize] = size(test_feature); [dlen, i] = size(ptrans); % Testing tsize = 10; % test 10 times miss = 0; elapse = 0; for tstep = 1:tsize % Create neural network

net1 = newff(minmax(p2trans),[3*dlen 1*dlen 1*dlen 1],{'tansig' 'tansig' 'tansig' 'tansig'},'trainrp');

net1.trainParam.epochs = 1000; net1.trainParam.goal = 1e-10; net1.trainParam.min_grad = 0; % Simulate neural network stime = cputime; [net1,tr]=train(net1,ptrans,train_result); elapse = elapse + cputime - stime; op = sim(net1,p2trans); % Compare test results for j=1:testSize if (sign(op(j)) ~= sign(test_result(mod(j-1,331)+1))) miss = miss + 1; end %score = score + op(j) * test_result(j); end end % Display results s = sprintf('Number of misses: %d %d out of %d',miss, miss/tsize/testSize,331); disp(s); s = sprintf('Time elapsed: %d ',elapse/tsize); disp(s);

104

Bibliography

[AAR02] D. Agrawal, B. Archambeault, J.R. Rao, and P. Rohatgi, “EM side–

channel(s): attacks and assessment methodologies,” April 2005; http://www.research.ibm.com/intsec/emf-paper.ps.

[ANC] GuruNet, “Answers.com”, April 2005,

http://www.answers.com/topic/gradient-descent. [ANSI] ANSI X9.62, Public Key Cryptography for the Financial Services Industry:

The Elliptic Curve Digital Signature Algorithm (ECDSA), 1999. [B95] C. Bishop, Neural Networks for Pattern Recognition, Oxford: University

Press, 1995. [BDL] D. Boneh, R.A. Demillo, R.J. Lipton, “On the importance of checking

computations,” April 19, 2005, http://jya.com/smart.pdf.

[BHL00] M. Brown, D. Hankerson, J. Lopez, A Menezes, “Software implementation of NIST elliptic curves over binary fields,” CHES 2000, LNCS 1965, Springer-Verlag, 2000, 1 ff.

[BJ02] E. Brier and M. Joye, “Weierstra elliptic curves and side-channel attacks,"

PKC 2002, LNCS 2274, Springer-Verlag, 2002, pp. 335-345. [BS96] E. Biham, A. Shamir, “Research announcement: A new cryptanalytic attack

on DES,” October 18, 1996, http://jya.com/dfa.htm. [C99] J. Coron, “Resistance against differential power analysis for elliptic curve

cryptosystems,” CHES 1999, LNCS 1717, Springer-Verlag, 1999, pp. 292-302.

[CC87] D. Chudnovsky and G. Chudnovsky, “Sequences of numbers generated by

addition in formal groups and new primality and factoring tests”, Advances in Applied Mathematics, 1987, pp. 385-434.

[CRR02] S. Chari, J.R. Rao, P. Rohatgi, “Template Attacks,” CHES 2002, LNCS 2523,

Springer-Verlag, 2002, pp.172-186. [EM6990] Electro-Metrics Inc., Instruction Manual: Broadband Amplifier Model EM-

6990, 2004. [EM6992] Electro-Metrics Inc., Instruction Manual: Near Field Probe Set Broadband

Response Model EM-6992, 2004.

105

[GMO01] K. Gandolfi, C. Mourtel and F. Olivier, “Electromagnetic attacks: concrete

results,” CHES 2001, LNCS 2162, Springer-Verlag, 2001, pp 251–261. [S96] L Smith, “An introduction to neural networks”, October 1996,

http://www.cs.stir.ac.uk/~lss/NNIntro/InvSlides.html. [JQ01] M. Joye and J. Quisquater, “Hessian elliptic curves and side-channel attacks”,

CHES 2001, LNCS 2162, Sprinter-Verlag, 2001, pp. 402-410. [K96] P. Kocher, “Timing attacks on implementations of Diffie-Hellman, RSA, DSS

and other Systems,” CRYPTO '96, 1996, pp. 104-113. [IT02a] T. Izu and T. Takagi, “A fast parallel elliptic curve multiplication resistant

against side channel attacks," Technical Report CORR 2002-03, University of Waterloo, 2002, http://www.cacr.math.uwaterloo.ca/.

[IT02b] T. Izu and T. Takagi, “On the security of Brier-Joye's addition formula for

Weierstrass-form elliptic curves," Technical Report No. TI-3/02, Technische University Darmstadt, 2002. http://www.informatik.tu-darmstadt.de/TI/.

[KJJ99] P. Kocher, J. Jaffe, and B. Jun, “Differential power analysis,” CRYPTO ’99,

Springer-Verlag, 1999, pp. 388-397. [Koblitz] N. Koblitz, “Elliptic cure cryptosystems”, Mathematics of Computation, 1987,

pp. 203-209 [KW03] C. Karlof, D. Wagner, “Hidden Markov Model Cryptanalysis,” CHES 2003 [LS01] P. Liardet and N. Smart, “Preventing SPA/DPA in ECC systems using the

Jacobi form”, CHES 2001, LNCS 2162, Springer-Verlag 2001, pp. 391-401. [M87] P. Montgomery, “Speeding the Pollard and Elliptic Curve Methods for

Factorizations”, Mathematics of Computation, vol. 48, 1987, pp. 243-264. [Miller] V. Miller, “Uses of elliptic curves in cryptography”, Crypto ’85, Lecture

Notes in Computer Science, 218 (1986), Springer-Verlag, pp. 417-426. [Murray] K. D. Murray, “The great seal bug story”, Murray Associates, 2002. [MW] MathWorks, “Online MATLAB documentation, ”April 2005,

http://www.mathworks.com/access/helpdesk/help/helpdesk.html. [NIST] National Institute of Standards and Technology, Recommended Elliptic

Curves for Federal Government Use, Appendix to FIPS 186-2, 2000.

106

[O02] E. Oswald, “Enhancing simple power-analysis attacks on elliptic curve cryptosystems”, CHES 2002 LNCS 2523, Springer-Verlag, 2002, pp. 82 ff.

[OA01] E. Oswald, M. Aigner, “Randomized addition-subtraction chains as a

countermeasure against power attacks”, CHES 2001, LNCS 2162, Springer-Verlag, 2001, pp. 39 ff.

[OS00] K. Okeya and K. Sakurai, “Power analysis breaks elliptic curve cryptosystems

even secure against the timing attack", INDOCRYPT 2000, LNCS 1977, Springer-Verlag, 2000, pp. 178-190.

[QS01] J.J. Quisquater and D. Samyde. “ElectroMagnetic analysis (EMA): measures

and counter-measures for smart cards,” In Smart Card Programming and Security (E-smart 2001), LNCS 2140, pp. 200–210.

[R96] B.D. Ripley, Pattern Recognition and Neural Networks, Cambridge

University Press, 1996. [SEC2] Standards for Efficient Cryptography Group/Certicom Research, SEC 2:

Recommended Elliptic Curve Cryptography Domain Parameters, Version 1.0, 2000.

[Tek] Tektronix Inc., User Manual: Digital Phosphor Oscilloscopes TDS7254,

2003. [WW4] J. Waddle and D. Wagner, “Towards efficient second-order power analysis,”

CHES 2004, Springer-Verlag 2004, pp. 1 – 15. [YR01] Yeung and Ruzzo, “Principal component analysis for clustering gene

expression data,” Bioinformatics 17(9): 763-74., 2001

EM Analysis of ECC Computations on Mobile Devicescgebotys/NEW/Simon_thesis.pdf · Many secured...

Documents

Transcript of EM Analysis of ECC Computations on Mobile Devicescgebotys/NEW/Simon_thesis.pdf · Many secured...