Machine-Learning assisted Side-Channel Attacks on RNS ... · Machine-Learning assisted Side-Channel...

Machine-Learning assisted Side-Channel Attackson RNS-based Elliptic Curve Implementations

using Hybrid Feature Engineering

Naila Mukhtar1, Louiza Papachristodoulou2, Apostolos P. Fournaris3, LejlaBatina4, Yinan Kong1

1 School of Engineering, Macquarie University, Australia2 Fontys University of Applied Sciences, The Netherlands,

3 Industrial Systems Institute/R.C. ATHENA, Greece4 Institute for Computing and Information Sciences (ICIS), Radboud University, The

Netherlands

Abstract. Side-channel attacks based on machine learning have re-cently been introduced to recover the secret information from softwareand hardware implementations of mathematically secure algorithms. Con-volutional Neural Networks (CNNs) have proven to outperform the tem-plate attacks due to their ability of handling misalignment in the sym-metric algorithms leakage data traces. However, one of the limitations ofdeep learning algorithms is the requirement of huge datasets for modeltraining. For evaluation scenarios, where limited leakage trace instancesare available, simple machine learning with the selection of proper featureengineering, data splitting, and validation techniques, can be more effec-tive. Moreover, limited analysis exists for public-key algorithms, espe-cially on non-traditional implementations like those using Residue Num-ber System (RNS). Template attacks are successful on RNS-based El-liptic Curve Cryptography (ECC), only if the aligned portion is usedin templates. In this study, we present a systematic methodology forthe evaluation of ECC cryptosystems with and without countermeasuresagainst machine learning side-channel attacks using two attack models.RNS-based ECC datasets have been evaluated using four machine learn-ing classifiers and comparison is provided with existing state-of-the-arttemplate attacks. Moreover, we analyze the impact of raw features andadvanced hybrid feature engineering techniques, along with the effect ofsplitting ratio. We discuss the metrics and procedures that can be usedfor accurate classification on the imbalance datasets. The experimentalresults demonstrate that, for ECC RNS datasets, the efficiency of simplemachine learning algorithms is better than the complex deep learningtechniques when such datasets are not so huge.

Keywords: Elliptic Curve cryptography, Side-Channel Attacks, Ma-chine Learning, Feature Engineering

1 Introduction

Side-channel attacks (SCA) constitute an ever evolving technique of recover-ing secret information from the exploitation of physical leakage that appears incryptographic implementations (e.g. power consumption, electromagnetic ema-nations, timing, vibrations leakage (37; 19; 25)). From an information-theoreticpoint of view, profiled template attacks are one of the most powerful SCAs.The attacker in such attacks is assumed to have access not just to the targetdevice, but also to an open copy of it for the profiling phase. Having controlof the secret information, he creates a leakage profile that he can later use toretrieve an unknown secret (not under his control) from its collected leakagetraces during a cryptographic operation (14). Recently, machine learning (ML)based side-channel attacks have been proposed, as direct extension of templateattacks, extending the concept of leakage templates into trained ML models.These models can be used for secret information predictions, thus providing aninterconnection between the SCA and ML research field (41; 27; 40). Further-more, several researchers showed that machine learning and deep learning (DL)techniques, like Convolutional Neural Networks (CNNs) outperform traditionalside-channel attacks since they are able to learn from misaligned data and, there-fore, eliminate the need of pre-processing (12; 36). Picek et al. have evaluated theimpact of various feature engineering techniques on profiled side-channel attackson AES (46). Mukhtar et al. (43), have presented side-channel leakage evaluationon protected and unprotected ECC Always-double-and-add algorithm using ma-chine learning classifiers and proposed to use signal properties as features. Zaidet al. in (53) have shown the insights for the selection of features while buildingan efficient CNN architecture for side-channel attacks. However, while CNNs canimprove the performance and efficiency of the attacks, a huge amount of leakagetraces are required for training such a model. Therefore, it can be discouragingfor the attacker to use deep-learning techniques for SCA.

In the recent literature, there is a considerable amount of research worksfocused on ML and DL SCAs for symmetric-key algorithms. However, only fewresearchers have tumbled with the increased complexity and high number ofsamples in traces that exist in public-key cryptosystems (36; 51; 13) identifyingthe presence of a gap of attack analysis on public-key cryptographic algorithms.The few ML/DL based evaluation analysis that exists for public-key cryptogra-phy, do not yet consider the evaluation of cryptosystems under the presence ofstrong SCA countermeasures.

According to the no-free lunch theorem, no two datasets will show the sameresults for the same classifier (52). Thus, the ML analysis on SCAs provided forsome symmetric-key implementations and even public-key cryptographic imple-mentations (e.g. RSA) won’t be of much use in other settings like ECC imple-mentations. Additionally, the complexity of the ECC computations makes thewell known ML analysis concerns of under-fitting and over-fitting, occuring dueto bias and variance in data, very crucial. In fact, the machine might learn fromdata so well or so poorly, that it is unable to generalize on the unseen data,thus making the training accuracy deceiving. To cater these concerns, an opti-

2

mal number of data traces need to be identified, proper data splitting strategymust be chosen, and appropriate feature engineering techniques must be ad-ministered. These activities, though, are hard to specify as the cryptographiccomputations become more elaborate and include strong SCA countermeasures.Thus, all the above issues highlight the need for a concrete methodology to ana-lyze ECC implementation datasets for ML-based profiling SCAs especially whensuch implementations have dedicated, strong, SCA countermeasures.

Elliptic curve cryptographic primitives have been widely studied for the effi-ciency and SCA resistance. Therefore, many efficiency enhancement techniquesand SCA countermeasures have been devised. Among them, several researchershave proposed using Residue Number System (RNS) arithmetic representationas a way of decreasing scalar multiplication computation delay (30; 42) by trans-forming all numbers to the RNS domain before performing finite field operations(6). In addition, RNS can be used for producing strong SCA countermeasuresthat can withstand simple and advanced SCAs (6) using the Leak ResistantArithmetic (LRA) technique. Recently, a comprehensive study on RNS ECCimplementations for Edwards Curves (44), using the Test Vector Leakage As-sessment (TVLA) techniques (26), showed that the combination of traditionalSCA countermeasures like Base Point randomization, scalar randomization etc.when combined with LRA based RNS countermeasures can considerably reduceinformation leakage. Also in (44) it was proven that profiled template attacks onRNS SCA protected implementation are partially successful (using location de-pendent and data dependent leakage attacks) thus implying that more powerfulattacks may be able to compromise the RNS SCA countermeasures (44; 24).

In this paper, a concrete methodology for Machine Learning SCA resistanceof RNS-based ECC cryptosystems is proposed, realized in practise and analyzedin depth using various ML model algorithms and feature engineering techniquesin order to achieve optimal results. This study could serve as a guideline forRNS-based RSA implementations as well. The methodology is able to retrieveattack vulnerabilities even against noisy RNS-based implementations that in-clude RNS and traditional SCA countermeasures. More specifically, we focusour evaluation plan on location dependent and data dependent leakage attacks(both with and without countermeasures). Our analysis includes several restric-tions like misaligned and imbalanced datasets, as well as restricted number oftraces. Furthermore, a comparison of attack models using four machine learn-ing classifiers is made. We also discuss the criteria on the selection of optimizedhyperparameters for each of the classifier. Once the optimally tuned model pa-rameters are selected, then further feature engineering techniques are applied toanalyze the attack performance with reduced number of features. In scenarioswith limited number of samples in the datasets, data splitting ratio can be one ofthe attack performance affecting factors. Finally, we analyze the effect of threedata splitting ratios on the overall attack performance. Analytically, the papernovelty is the following:

– A six stage methodology for launching a practical machine-learning basedside-channel attack is proposed. Our analysis is based on assessing the SCA

3

resistance of an RNS-based ECC implementation with and without coun-termeasures. For the first time in research literature, the effectiveness of acombination of RNS and traditional SCA countermeasures on an RNS ECCimplementation against machine-learning based side-channel attacks is pre-sented.

– Machine learning based side-channel attacks are presented for location anddata dependent leakage models using four machine learning classifiers. Foreach classifier, hyperparameter tuning has been performed to extract thebest-trained model for the underlying problem. Results are presented usingstandard machine-learning evaluation metrics. The implications of relyingon the classification accuracy alone, in case of imbalance data, are also dis-cussed.

– Various state-of-the-art hybrid feature engineering techniques, which haveproven to offer performance improvement in other domains, are tested onside-channel leakage traces from RNS-based ECC implementations. Threehybrid feature engineering approaches are proposed in order to handle thecomplexity of public-key cryptographic trace. The impact of dimensionalityreduction along with the filter and wrapper feature selection methods, isobserved.

– This work also investigates the effect of data splitting and validation foldson the attack efficiency for RNS-ECC dataset.

– An RNS-based ECC implementation is one challenging dataset, due to theRNS operation intrinsic parallelism. For RNS-based implementations, exist-ing traditional template attacks are successful only if the aligned portion ofthe traces is used for the attack. This limitation makes the attack difficultto launch. However, in this study, quantitative analysis is performed to ana-lyze the success of the machine learning-based attack by using the full tracelength and the aligned part for training the model.

The rest of the paper is organized as follows. Section 2 presents the classifiersused for evaluation along with the algorithm under attack. Section 3 explains theattack methodology along with other evaluation strategies and datasets used forevaluation. Section 4 presents the results on RNS-based ECC leakage datasets.Section 5 concludes the paper.

2 Preliminaries and Related Literature

2.1 Potentials of RNS as Side-Channel Attack Countermeasure

The Residue Number System (RNS) is a non-positional arithmetic represen-tation, where a number X is represented by a set of individual n moduli xi(X →RNS X : (x1, x2, ...xn)) of a given RNS basis B : (m1,m2, ...mn) as longas 0 ≤ x < M , where M =

∏ni=1mi is the RNS dynamic range and all mi

are pair-wise relatively prime. Each xi can be derived from x by calculatingxi = 〈x〉mi = x (mod mi). Since it can effectively represent elements of cyclicgroups or finite fields there is merit in adopting it in elliptic curve underlined

4

finite field operations. RNS hardware implementations of Montgomery multipli-cation for elliptic curves (4) and RSA (17) showed that RNS usage can increasescalar multiplication efficiency. Furthermore, RNS can be used to design SCAcountermeasure as is observed in several research papers, for instance Bajard etal. in (6; 5), Guillermin (30), Fournaris et al. (22; 23). RNS parallel processingof finite field operations apart from speed offers also different representation ofthe elliptic curve points, which may reduce SCA leakage. Also, RNS is a nonpositional system (single bit change in an RNS number’s moduli can lead toconsiderable changes in the binary representation of finite field element) whichintrinsically increases noise in the computational process (42). Furthermore, in(6), the Leak Resistant Arithmetic (LRA) technique was proposed where it wasproven that by creating a big pool of RNS basis moduli (at least 2 × n), thenrandomly choosing some of them to act as an RNS basis for representing finitefield elements and a specific computation flow, randomly permuting this RNSbasis, can be a potent SCA countermeasure. LRA has been applied to modularexponentiation designs in two ways, either by choosing a new base permutationonce at the beginning of each scalar multiplication or by performing a randombases permutation once in each scalar multiplication round (23). In this paper,the second approach is adopted.

2.2 RNS-based ECC Scalar Multiplication

The ECC scalar multiplication algorithm evaluated in this paper is based ona variation of Montgomery Power Ladder (MPL) for Elliptic Curves on GF (p)(35). Algorithm 1 uses the LRA technique by choosing a random base γi permu-tation and transforming all GF (p) elements in this permutation in each MPLround i. After the end of the round the algorithm chooses a different base pointpermutation for the next round. This RNS SCA countermeasure is enhancedwith the base point V randomization technique using an initial random point R(24). All GF (p) multiplications used in EC point addition and doubling are doneusing the RNS Montgomery multiplication (6). Apart from the above counter-measures as proposed in (44), a RNS operation random sequence approach isalso followed i.e. the individual moduli operation for each RNS addition, subtrac-tion or multiplication are executed in a random sequence. Furthermore, scalarrandomization is used as a countermeasure. This is based on the concept of com-puting random multiples r of the order of the curve #E instead of computingdirectly the scalar multiplication [e]P (i.e. one can compute the same point as[e + r#E]P ). The bits of scalar e are masked using a different random valuer at each SM execution. In order to evaluate the potential of the above coun-termeasures, four variants of the algorithm were implemented, with differentcountermeasures activated each time.

2.3 Machine Learning Algorithms

In this paper, four different classifiers are used to create the trained ML-DLmodel of a Device Under Test (DUT) leakage information, Support Vector Ma-

5

Algorithm 1: LRA SCA-FA Blinded MPL (21)

Input: V , R ∈ E(GF(p)), e = (et−1, et−2, ...e0)Output: e · V or random value (in case of faults)

1 Choose random initial base permutation γt. ;2 Transform V, R to RNS format using γt permutation;3 R0 = R, R1 = R+ V , R2 = −R ;

4 Convert R0,R1,R2 to Montgomery format

5 for i = t− 1 down to 0 do6 R2 = 2R2, performed in permutation γt ;7 Choose a random base permutation γi;8 Random Base Permutation Transformation from γi+1 to γi for R0 and R1 ;9 if ei = 1 then

10 R0 = R0 +R1 and R1 = 2R1 in permutation γi;11 end12 else13 R1 = R0 +R1 and R0 = 2R0 in permutation γi;14 end15 Random Base Permutation Transformation from γi to γt for V ;

16 end17 if (i, e not modified and R0 + V = R1) then18 Random Base Permutation Transformation from γ0 to γt for R0;19 return R0 +R2 in permutation γt ;

20 end21 else22 return random value23 end

chine, Random Forest, Multi-Layer Perceptron and Convolutional Neural Net-works. In this subsection, each classifier is described in brief, the parametersthat were identified as important for profiling SCAs are specified and the basicclassified benefits are presented.

Support Vector Machine (SVM) Support vector machines (SVMs) areone of the most popular algorithms used for classification problems in differ-ent application domains, including side-channel analysis (54; 31; 18). In SVM,n-dimensional data is separated using a hyperplane, by computing and adjust-ing the coefficients to find the maximum-margin hyperlane, which best separatesthe target classes. Often, real-world data is very complex and cannot be sepa-rated with a linear hyperplane. For learning hyperplanes in complex problems,the training instances or the support vectors are transformed into another di-mension using kernels. There are three widely used SVM kernels; linear, radialand polynomial. To tune the kernels, hyperparameters like ’gamma’ and cost ’C’play a vital role. Parameter ’C’ acts as a regularization parameter in SVM andhelps in adjusting the margin distance from the hyperplane. Thus, it controls thecost of misclassification. Parameter ’gamma’ controls the spread of the Gaussiancurve. Low values of ’C’ reflect more variance and lower bias; however, higher

6

values of ’C’ show lower variance and higher bias. However, higher gamma leadsto better accuracy but results in a biased model. To find an optimum value of’C’ and ’gamma’, gridsearch or other optimization methods are applied.

Random Forest (RF) In Random forest (RF), data forest is formed by ag-gregating the collection of decision trees (11). The results of individual decisiontrees are combined together to predict the final class value. RF uses unprunedtrees, avoids over-fitting by design, and reduces the bias error. Efficiently mod-eling using random forests, highly depends on the number of trees in the forestand the depth of each tree. These two parameters have been tuned for an efficientmodel in this study.

Multi-Layer Perceptron (MLP) Multi-Layer Perceptron (MLP) is a basicfeed-forward artificial neural network that uses back-propagation for learningand consists of three layers: input layer, hidden layer, and a output layer (49).Input layer connects to the input feature variables and output layers returnsback the predicted class value. To learn the patterns from the non-linear data,non-linear activation function is used. Due to the non-linear nature of side-channel leakages, MLP appears to be the best choice, in order to recover secretinformation from learning patterns of the signals.

Convolutional Neural Network(CNN) Convolutional Neural Network (CNN)is a type of neural network which consists of convolutional layers, activation lay-ers, flatten layer, and pooling layer. Convolutional layer performs convolution onthe input features, using filters, to recognize the patterns in the data (39). Thepooling layer is a non-linear layer, and its functionality is to reduce the spatialsize and hence the parameters. Fully connected layers combine the features back,just like in MLP. There are certain hyperparameters related to each layer, whichcan be optimized for an efficient trained model. These parameters include learn-ing rate, batch size, epochs, optimizers, activation functions, etc. In addition tothese, there are a few model hyperparameters which can be used to design anefficient architecture. It should be noted that the purpose of this study is not topropose the architecture design of the convolutional neural network (CNN) butto analyze and test the existing proposed CNN design on the RNS-based ECCdataset. Therefore, the focus is on tuning the optimized hyperparameters ratherthan model hyperparameters.

2.4 Feature Engineering Techniques

Features play a key role in accurate machine learning analysis. Sample valuesin a trace T represent the features. It is evident from previous research thatmore is not better when it comes to features in the training dataset. Featurereduction/extraction techniques have a distinct effect on the machine learningalgorithms. Redundant features can give rise to over-fitting and hence result in an

7

inaccurate analysis. To eliminate unnecessary data features, feature engineeringtechniques are used (10). There are three main benefits of performing featureengineering to select the most contributing features. It eliminates over-fittingproblem, gives simple accurate model and improves computational efficiency.

Generally, machine learning model can be represented with the eq. 1, where Frepresents the feature matrix and w represents the weights learnt during learningsteps that are used for predicting the class on unseen values.

yi = w0 +

Fn∑j=1

Fijwj (1)

The massive set of features can confuse the model during the learning process.In this paper, our goal is to reduce the large number of features and create anefficient, effective and accurate machine learning model for RNS ECC data. Inall cases, number of features Fm are selected from a pool of features Fn, whereinequality (2) holds.

Fm < Fn (2)

Feature Extraction In feature extraction techniques, a new feature dataset isformed based on the existing feature dataset. More precisely, the dimensional-ity of data is reduced. Based on the transformation method being used, featureextraction can be categorized into linear transformation and nonlinear transfor-mation. Two of the well-known techniques for feature extraction are PrincipalComponent Analysis (PCA) and Linear Discriminant Analysis (LDA). PrincipalComponent Analysis is a statistical procedure to reduce the dimensionality ofthe data using orthogonal transformation, while retaining the maximum varianceand internal structure of the sample (34). However, the subspace vectors in lowdimensional space might not be optimal as PCA does not involve sample classes.LDA is a supervised learning dimensionality reduction technique, in which dis-tance between mean of each class is maximized by projecting the input data toa linear subspace (8; 50). It helps in reducing the overlap between the targetclasses. PCA has been used for traditional side-channel leakage analysis and hasalso been used as a feature extraction in machine learning analysis (7; 28). How-ever, the effect of dimensionality reduction on RNS-based ECC implementationdatasets has not been analyzed.

Feature Selection In feature selection techniques, a new feature dataset isformed by selecting most contributing features from the existing features set.There are three main approaches for feature selection: filter, wrapper and em-bedded methods. In this study, feature datasets are formed using filter methods,wrapper methods and a hybrid approach based on both methods. In filter meth-ods, intrinsic properties of the features are selected, based on the relevance,using uni-variate statistical analysis (33). Filter methods used in this study areChi-Square Test (Chi2), Pearson’s Correlation Coefficient (PCorr), Mutual In-formation (MI), F-test, and T-test. In wrapper methods, classifiers are used

8

to measure the usefulness of the features using cross-validation. In this featureselection technique, optimal features are selected based on the algorithm per-formance by iteratively using a search algorithm (38). In this study, RecursiveFeature Elimination using Random Forest (RFE-RF) and sequential FeatureSelection using Random Forest (RF-Imp) are used.

3 Machine Learning based Evaluation Methodology forECC RNS Scalar Multiplication

For an Elliptic Curve Cryptography(ECC)-based cryptosystem, the main tar-get of SCAs is the scalar multiplication (SM), and more precisely in our casethe scalar multiplication in Montgomery Powering Ladder (MPL). The RNS ap-proach introduces significant differences in the finite field computation approachfollowed in each point operation (48) that impacts the side channel trace. En-hancing this approach, with traditional and RNS SCA countermeasures, makespossible SCA attacks as well as SCA assessment hard to implement. Based onthe work of (44) profiling attacks are the only SCAs that can only partially com-promise an SCA resistant RNS ECC SM implementation (using data-dependentand location-dependent template attacks). However, there is no indication ifsuch RNS implementations (protected or unprotected) can withstand potentML-based profiled SCAs. Thus, in this paper, the template attack approacheshave been extended to utilize the pattern learning capability of the machinelearning algorithms, in order to evaluate the amount of secret information thatcan be recovered. For recovering the secret key bit, the ML-based attack formula-tion leads to the binary classification problem. The need for a solid tailor-mademethodology to access RNS ECC SM implementation stems from the uniquecharacteristics of the SM under attack combined with the fact that ML modelsare adapted to the problem at hand. In this paper, such a concrete ML-based pro-filing SCA methodology is proposed and analyzed in detail. Initially, we collectleakage traces following specific attack scenarios that match possible leakage ofthe RNS ECC SM implementation. Then the collected raw data are aligned andcleared from noise using pre-processing and then are split into separate train-ing and testing datasets. Both datasets are separately processed using featureengineering techniques. The reduced feature training dataset is used to trainthe machine learning model, and a reduced feature testing dataset is used totest the trained model for the recovery of the scalar key bits by predicting thekey bit class. An overview of the complete methodology is given in Fig. 1. Themethodology is split into the following six distinct stages:

1. Attack Scenario Specification: This constitutes the first stage of themethodology plan. In this stage the possible targets on the ECC RNS SMalgorithm are identified. More specifically, as in all SM MPL variations, themost evident information leakage can be observed from the scalar bit de-pended sample difference when updating R0 or R1 storage areas and/or thescalar bit depended trace difference when point doubling operation is exe-

9

cuted on R0 or R1. Following the approach, carried in (44), two attack sce-narios can be identified for ML SCAs: data-dependent attacks and location-dependent attacks. It should be pointed out that the RNS structure of allinvolved numbers (each number is split in several independent moduli) makesthe power or EM variations due to different memory storage, more complexsince the point coordinates are no longer single numbers to be handled by abig number software library (that may lead to R0 or R1 storage in contigu-ous memory blocks) but, on the contrary, small numbers that may be storedindependently in memory.

2. Raw Trace Preprocessing Mechanism: The fact that RNS operationsare performed as individual, autonomous moduli operations, thus triggeringexecution optimizations (parallel processing, pipelining etc.) along with thefact that the algorithm 1 RNS ECC SM implementation has several powerfulSCA countermeasures and the fact that software implementations lead tonoisy and misaligned traces, highlight the need for a trace preprocessingstage before using them for ML model training and profile attacking.

3. Data Splitting: At this stage, the preprocessed collected raw data are splitinto separate training and testing datasets. In side-channel data analysis, theavailable leakage data traces might be limited. Splitting data with 50-50 ratiomight produce a very small training dataset. Insufficient training data tracesmight result in over-fitted or under-fitted model. On the other hand, havingtoo little testing dataset might not evaluate the trained model correctly. Atrade-off value is required to train and test the model. To cater for thisreal-world side-channel analysis limitation, at this state, the appropriatedata splitting is studied and the impact of different data splitting ratios fortraining and testing data and deduce the best data split ratio is determined.

4. Feature Selection and Processing: Another important aspect of ma-chine learning analysis is the features. Redundant features can lead to over-fitting and curse of dimensionality, which ultimately results in an inaccuratemodel. At this stage, appropriate feature engineering techniques and featureprocessing combination models are proposed in order to choose the optimalfeatures for ML model training. Also, a combination of feature processingmodels and designed experiments are proposed in order to test the proposedfeature processing combination models.

5. ML Classification model training: At this stage, the ML classifier mod-els are trained using an optimal set of parameters. The machine-learningalgorithms, described in Sect. 2.3, are used at this stage i.e Support VectorMachines (SVM), Random Forest (RF), Multiplayer Perceptron (MLP) andConvolutional Neural Networks (CNN). The algorithms have been tuned toachieve the best performance.

6. Key Prediction: The final stage of the overall methodology is devoted tothe usage of the ML trained models on the trace testing set in order toevaluate the SCA resistance of the RNS ECC SM implementation againstML profiling attacks.

10

Fig. 1: Machine Learning based Evaluation Methodology for RNS-ECC

In the following subsections, the methodology stages are described in moredetail. Also, we propose how each stage should be used in order to analyze andassess the ML-SCA resistance of algorithm 1 with and without the presence ofcountermeasures. The parameter settings used for the algorithm under study arementioned in each stage.

3.1 Trace Collection Experimental Setup

All trace Datasets for the following analysis are collected by executing algorithm1 RNS-based ECC SM implementation (in two variants) on a BeagleBone Blackthat use an ARM Cortex A8 processor operating at 1GHz. Samples were col-

11

lected using EMV Langer probe LF B-1, H Field (100KHz- 50MHz), and LecroyWaverunner 8404M-MS with 2.5GS/sec sampling rate.

The ECC RNS SM algorithm 1 implementation was taken from a publicrepository (21) and was customized according to the requirement for data col-lection and attack scenario determined in the proposed methodology stages.For data collection and formatting, Matlab R2019 and Inspector 4.12 providedby Riscure was used (1). For machine learning analysis, a Python environmentwith Keras and Scikit learn libraries has been used (16). All features selec-tion/extraction methods have been taken from Scikit learn (45) except T-testwhich was implemented in-house.

To meet computation extensive needs of machine learning algorithms, NCI(National Computational Infrastructure) Australia high-performance supercom-puting server has been used (2).

3.2 Attack Scenarios Specification

Machine Learning based Data-dependent Leakage Analysis (MLDA)In data-dependent attack scenario, the adversary can monitor the power or elec-tromagnetic emission (EM) fluctuations due to the processing of a different valueof the i−th scalar bit ei. This is reflected in processor instructions correspondingto line 9 of the ECC scalar multiplication algorithm (Alg. 1), where performedoperations depend on the value of secret key bit ei resulting in registers R0

and R1 updated differently. R0 contains the addition result and R1 contains thedoubling result if the scalar secret key bit ei = 1 and in reverse order if ei = 0(R1: addition, R0: doubling). Since the data determine the register that is usedand therefore causes the leakage, we refer to this analysis as “data-dependentleakage”. Such data leakages should also be observable using protected scalarbit countermeasures if the scalar bits under attack are retrieved from a memorylocation in a clear view.

For the purpose of analysis, we have collected the leakages traces of the firstfew algorithm 1 rounds for a 233− bit scalar. As explained, data leakage LD islabeled as ‘1’ if the scalar ei =‘1’ and is labeled ‘0’ otherwise in round i. Onlyone instruction was observed and 50k traces, each of 700 samples, were collected;out of which around 3k-7k were utilized after alignment in the other stages ofthe proposed methodology.

Machine Learning based Location-dependent Leakage Analysis (MLLA)In location-dependent attack scenario, key-dependent instruction leakages areexploited, utilizing the storage structure information. More precisely, it is as-sumed that based on the storage content, the leakages for a particular operationwill be distinguishable. It can be observed that in each round i of algorithm 1only two operations have key-dependent instruction; that is, addition and dou-bling. Both operations are performed in the same order, irrespective of the valueof the scalar key bit ei. However, the storage content differs according to thescalar bit value. The storage register R0 is doubled when the scalar key bit is

12

‘0’, otherwise R1 is doubled. Based on the fact that there is no memory addressrandomization, we can exploit the vulnerability by collecting the leakage datafor doubling operation. The data will be labeled and classified based on the con-tent of storage registers R0 and R1. Such memory access leakage has also beenexploited for RNS-based RSA in (29). Papachristodoulou et al. in (44), have ex-ploited a similar vulnerability for ECC SM by utilizing a small sample windowof 451 samples (out of 3k samples per trace) for training and classification fortemplate profiling SCAs. Identifying the specific samples for training purposesrequires more in-depth knowledge of the underlying system and requires a lot ofsignal processing, which might be discouraging for the attacker. The work of An-drikos et al. performed location-based attacks using machine/deep learning butthose were focused on accessing different SRAM locations and are not algorithm-specific (3). In our work, we have used the ML approach to classify the scalarkey bit ei, exploiting the doubling operation leakage, by using the whole tracerather than the small sample portion of 451 samples. We have achieved similarresults, which proves that the machine learning attack is realistic and practicalfrom an attacker point of view. For the location-based analysis, we have labeledleakage data LD as ‘0’ if R0 is doubled and labeled LD as ‘1’ if R1 is doubled.We collected 50k traces (each of 3k samples long), out of which 14k traces areused after stage 2 (preprocessing) of the proposed methodology

Datasets For a detailed evaluation of an RNS ECC SM approach against theabove two ML-based attack scenarios, all potential countermeasures that canbe applied on the implementation should be evaluated using the proposed RNSECC SM evaluation methodology. To achieve that, two implementation variantsof the algorithm 1 SM can be identified for each ML attack scenario, one withall SCA countermeasures enabled (protected version) and one with all SCAcountermeasures disabled (unprotected version). In line with the above rationale,for the evaluation of algorithm 1 the trace datasets of Table 1 can be identified,denoted and collected.

Table 1: Trace Dataset CategoriesName Countermeasures Notation

Protected Data DependentLeakages

RNS LRA technique, base point randomization,scalar randomization countermeasure and randomRNS operation sequence

DDP

Unprotected Data Depen-dent Leakages

no countermeasure DDUP

Protected Location Depen-dent Leakages

RNS LRA technique, base point randomization,scalar randomization countermeasure and randomRNS operation sequence

DLP

Unprotected Location De-pendent Leakages

no countermeasure DLUP

13

3.3 Raw Trace dataset Pre-processing

Trace Alignment Alignment plays an important role while using machinelearning techniques especially on raw leakage samples. In raw leakage samples orrow instances, each data point in a particular sample will be treated as a featureand then the feature columns are used to train the model. Having misaligned fea-tures might scatter the useful feature information all across the columns, hencemaking it difficult for the ML classifier to learn from the scattered haphazarddata. Misalignment generally occurs due to the noise of the neighboring com-ponents in the device. However, in some cases, noise is intentionally induced tothe system as a countermeasure to increase side-channel attack resistance. Exe-cuting a software implementation in an embedded system operating system (asis used in this paper experimental setup) will result in trace collection of noisethat is unexpectedly added from the other processes of the operating system.Common signal processing technique can be used in order to reduce the noiselike low pass or band pass filter. In the collected traces, the application of alow pass filter approach was chosen. Initially, the dominant frequencies are mea-sured using Fast Fourier Transform (FFT), as shown in Fig. 2 and it is observedthat the maximum energy lies between 0-300MHz, with the highest frequencyat 1GHz. Based on the observation, a low-pass filter is applied and the resultingclear patterns are used for alignment.

Fig. 2: Fast Fourier (FFT) of the leakage samples

Skewed or Imbalanced Datasets For a good performing trained model, itis imperative to have a balanced dataset. Skewed or imbalance dataset is theone in which the traces for one class label are more than the other. The trainedmodel will be biased due to the dominating class and will not be able to classifythe unseen data accurately. To emulate the problem of imbalance and observe itsimpact in the experimental process of the ECC RNS SM assessment, after traceswhere collected and aligned, we produced both balanced and unbalanced datasetoutcomes. Datasets DDP , DDUP and DLP were almost balanced, having ap-proximately 1050, 1500, and 3800 traces (for both 1’s and 0’s), respectively.

14

These three datasets had ideal balanced data for modeling. However, datasetDLUP traces were collected to be highly skewed. i.e. the number of traces forclass key bit ‘0’ was higher compared to class key bit ‘1’ (10150 and 42 tracesrespectively). To handle the skewness and minimize its impact, Synthetic Mi-nority Oversampling Technique (SMOTE) was used as it outperformed for othercryptographic datasets (47). SMOTE synthesizes new instances for the minorityclass traces and balances the data (15).

3.4 Data Splitting and Validation Strategy

Machine learning-based side-channel attacks are based on the template attackapproach. In template attacks, two datasets are used; template and test datasets.The template dataset (pre-defined examples) is used to train the system, andthen the test (unknown) dataset is used to evaluate the attack (14). Similarly, inML SCAs, the leakage data set LD is divided into the training dataset, DTrain,which is used to train the machine learning model and the test DTest dataset.Unlike, template attacks, though, another dataset is introduced in ML analysisknown as Validation DV al dataset. In this stage of the methodology, the abovedescribed dataset splitting and its role is analyzed below:

– DTrain dataset is used during the model fitting process and helps modellearn the patterns from data.

– During the evaluation, DV al is used to fine-tune the model using modelhyperparameters. The model never directly learns from the validation data,but it can occasionally see the data during the learning process. Hence itprovides biased evaluation and changes the model structure based on thevalidation data results.

– DTest dataset is completely unknown to the system and is never used in thetraining process. DTest provides an unbiased evaluation of the model.

One of the important aspects in machine learning is to decide the dividingratio of the training, validation, and testing sets. The bigger the dataset, thebetter the trained model will be. It becomes a huge problem, especially with thedatasets having a small number of instances (traces). To evaluate the effect ofdata division on secret information recovery, in this paper, three proportions aretested. The ratios used for training and testing datasets are 90-10%, 80-20%,and 50-50%. Datasets are shuffled before splitting for spreading the instances inthe space.

At this methodology stage analysis, we suggest in this paper, the use of k-foldcross-validation which is a resampling procedure used for evaluation of machinelearning trained model. After the initial dataset split into two sets, i.e. trainingand testing, the training dataset is further split using k-fold validation schemeinto training and validation. In this validation procedure, data samples are splitinto k groups. One group is a holdout or validation dataset and rest of the datais used for training the model. Model is fitted on the training group set andevaluated on the holdout/validation set. This ensures that the whole dataset

15

undergoes a proper validation process. For the k-fold validation, 5 and 10 foldsare the most recommended values as they neither give high variance nor highbias in the resulting validation error estimate (32). However, high number ofvalidation folds can lead to increased training time. This processing time can bereduced by using an optimal number of folds, yet still achieving a reliable trainedmodel. For our analysis, we have used three validation folds that is 3,5 and 10to infer the best performance validation folds for RNS-ECC SM datasets.

3.5 Feature Processing and Engineering: Proposed HybridApproaches

In the ECC RNS SM evaluation methodology, the feature engineering techniquesfor feature selection and processing described in subsection 2.4 are adopted. Inthis stage, we propose an analysis approach to deduce the impact of featureselection/extraction techniques on the machine learning model for RNS-basedECC data classification in three different experimental setups. In the first exper-imental setup, feature engineering techniques are applied on raw data samples toreduce the number of features, and then the machine learning model is trained.In the second experiment, one of the filter methods is applied to get the highlyranked features, and then PCA is applied to transform the data dimensions.

Considering that the prominent characteristics of two or more feature ex-traction/selection techniques can be combined together to improve the learningperformance and efficiency, at this stage of the evaluation analysis, we expandthe previous paragraph feature engineering to propose a hybrid feature approachthat can help in recognizing better features that contribute the most towards theaccuracy in less time. In this research work, we propose and test the followingthree approaches for the experimental ECC RNS SM evaluation of algorithm 1.

– Approach A: In the first approach, features dataset is processed using thefeature selection and extraction methods of subsection 2.4. Filter methodsused for analysis are Ftest, T-test, Chi2, MI, P Corr, PCA, Recursive FeatureElimination using Random Forest (RFE-RF), and Feature selection usingRandom Forest (RF-Imp). There are Fn total features for location-dependentleakages (MLLA) and data-dependent leakages (MLDA). Out of Fn, Fm

features are selected. The selected output features are directly given as inputto the machine learning models for training.

– Approach B: In the second approach, features datasets are processed (Tier1) using filter methods (Ftest, T-Test, Chi2, MI, P Corr), and the outputfeatures are further reduced (Tier 2) using PCA and LDA dimensionalityreduction techniques. For Tier 1 feature selection, Fm features are selectedfrom Fn pool of features, for both MLLA and MLDA. However, for Tier 2,Fo PCA components (features) are selected from Fm features dataset. Forbinary classification, LDA projects Fm features onto one dimension.

– Approach C: In the third approach, features processed through filter meth-ods are further reduced using recursive feature selection methods. Filtermethods rank the features according to the relevance and then features are

16

further selected based on the classifier algorithm performance. For Tier 1,filter methods are applied to reduce features to Fm from Fn for both MLLAand MLDA. For Tier 2, features Fm are further injected to wrapper RFE-RF and RF-Imp to select a subset of features containing Fo features. TheRFE-RF and RF-Imp methods recursively eliminate the redundant featureswhich do not contribute towards classification.

Fig. 3: Hybrid Feature Engineering Approaches

Our proposed approaches help in tackling the drawbacks of filter methods andwrapper methods. In filter methods, the target response class is not involved inthe selection process. To involve the target class, the relevant uncorrelated fea-tures are selected using filter methods and are further reduced by recursivelysearching through the feature pool. In the experimental analysis, recursively se-lecting features out of 3k or 700 features is highly computationally expensive andinvolves redundant processing as most of the features do not contribute towardsaccuracy at all. This approach helps in eliminating the least correlated redun-dant feature and thus reducing the time required for recursive feature selection.The graphical description of the proposed approaches is presented in Fig. 3.

3.6 ML Model Training: Parameter Tuning

At this stage of the ECC RNS SM evaluation methodology, the ML models aretrained using the features selected from the hybrid feature extraction process.The four classification algorithms described in Sect. 2 are used to evaluate theeffectiveness of the location-dependent and data-dependent attacks and also toevaluate the performance of the features subset, i.e. Support Vector Machines(SVM), Random Forest (RF), Multi-Layer Perceptron (MLP) and Convolutionalneural network (CNN). There are certain parameters in each classifier algorithm,

17

as mentioned in 2.3, that needs tuning. For the systematic evaluation of RNS-ECC SM, the hyperparameters are tuned using gridsearch to obtain the bestpossible trained model. The tuned hyperparameters are shown in the Table 2.

Table 2: Parameter tuning for SVM, RF, MLP and CNNClassifier Parameter Value Range

C [0.1, 0.01, 0.5, 1.0 ]SVM gamma [1,10,30,40,50]

kernel [Poly, Sigmoid, RBF]

Learning Rate [0.001,0.0001]MLP Solver [adam, sgd]

Batch Size [32]Activation Function [tanh,relu,identity,logistic]

Epochs [200]

RF Trees Depth [5,10,20,30]Number of Trees [10,50,100,200]

Learning Rate [0.001,0.01,0.1, 0.5]Epochs [300]

CNN Activation function [relu,selu,elu]Optimizer [Adam, Nadam, RMSprop,Adamax]Init Mode [uniform, normal]Batch Size [32, 100, 400]

4 Results and Discussions

Manifesting the proposed methodology for the experimental process described inSect. 3.1 for the ECC RNS SM implementation of algorithm 1 as described in theprevious section, the performance of our proposed approach and its outcomes-results can be evaluated and analyzed. There are various evaluation metricswhich can be used to evaluate the performance of machine learning models in-cluding Accuracy (Acc), Precision (specificity), Recall (sensitivity), F1 score,Receiver Operating Characteristics (ROC), and Area Under Curve (AUC). Forbinary classification problems on balanced dataset (as is our case), accuracy issufficient evaluation metric. Accuracy is the ratio of correct predictions to thetotal number of predictions. Hence, it exhibits the reliability of the model in apractical real-world scenario on unseen data.

As described in Sect. 3.2, four datasets of protected and unprotected leakagetraces are evaluated using four machine learning classifiers. It should be notedthat the parameter settings used for experimental setup is also given in the end ofeach stage description in methodology (Sect. 3). In this section, the experimen-tal results are presented for the proposed hybrid feature engineering techniques.The results are presented in four sections, for better understanding. Sect. 4.1presents the classifier’s performance on raw features, without applying any fea-ture engineering, Sect. 4.2 presents results after applying feature engineeringtechniques as explained in Sect. 3.5 approach A, Sect. 4.3 exhibits comparisonresults for Sect. 3.5 approach A, B and C, and Sect. 4.4 depicts the affect of

18

reduced validation folds and data splitting size. For the sets of experiments con-ducted in Sec. 4.1-4.3, the models are trained with the raw traces using fourclassifiers, for all four datasets. For comparative analysis with existing studies,analysis is divided into two sub-cases. In case a, machine learning analysis hasbeen performed on the full length traces that is, all the trace samples (tracelength 0-699 and 0-2999 for MLDA and MLLA, respectively) are used as fea-tures for training the model. However, in the case b, features dataset is reducedand only the aligned part of the traces (precisely, 550-900 for DDP , 1150-1950for DDUP , 80-250 for DLP , 190-250 for DLUP ,) is used for training the models.

4.1 Classifier’s Performance on Raw features

Fig. 4a and 4b show the accuracy of the trained classifiers for the case a andcase b, respectively. The plotted accuracy is achieved by tuning the hyperpa-rameters as given in Table 2. Best selected parameters are also given in Table 3.It can be observed that for location-dependent attacks (MLLA) in case a, thesecret can be recovered with 94-100% accuracy for protected and unprotectedimplementations. However, for data-dependent attacks (MLDA), the best accu-racy, approximately 54%, is achieved with RF. It should be noted that SMOTEis applied before applying machine learning classifiers, to balance the datasetsin some cases. In addition to accuracy, recall, precision, and F1 score has beenclosely monitored as well, which is less than 0.5 in case of CNN, but greater than0.9 for other classifiers.

(a) Trace Dataset with all samples (b) Trace Dataset with aligned reducedsamples

Fig. 4: Accuracy of classifiers without feature processing

It has also been observed that the complex deep learning model (CNN) didnot perform well for all the datasets, which was the expectation because datasetshave a small number of traces. It is expected that with a huge dataset, theperformance, using complex networks like CNN might improve, but the collectionof the huge dataset and high computational cost, might be highly discouragingfor the attacker. Scope of this study is to analyze the affect of limited size datasetswith computationally efficient classifiers. It has also been noticed that a simpleneural network like MLP gives good accuracy if complete trace length is used,

19

however, it cannot classify the target key bit (accuracy around 53%) with thereduced trace length, in case b. This shows that an amount of useful informationis contained in the unaligned portion of the trace as well.

Due to the inherent design capability of dealing with redundant features,in both SVM and RF, reducing the features per trace does not affect the clas-sification accuracy. RF, by design, constructs unpruned trees and removes theunnecessary redundant features during the training process, hence produces anefficient model without using any feature engineering technique. In SVM, Ra-dial Bias Function (RBF) kernel transforms the data and creates new featuresthat are separable in high dimensional space so by design it retains the mostcontributing features and eliminates unnecessary ones. It appears that the RNS-ECC SM location-based leakage is linearly separable in higher dimension space.However, this is not the case with RNS-ECC SM data-dependent leakages.

To analyze the possibility of under-fitting and over-fitting, training, valida-tion and testing accuracy, all are closely monitored in all cases. For SVM withRBF, it is observed that lower values of parameter ’C’ and higher values ofparameter ’gamma’ provide the best results. The validation curve for gammaparameter tuning is given in Fig. 5. For RF, 50 and 100, trees along with vary-ing tree depth of 5-20 present good results. For MLP, batch size 32, activationfunction ’relu’ and optimizer ’adam’ give the best results for MLLA analysis.However, for MLDA analysis, activation function ’tanh’ and optimizer ’sgd’ and’adam’ provide the best results for protected and unprotected leakage datasets,respectively.

Fig. 5: Gamma Parameter Tuning

20

Table 3: Best parameters for SVM, RF, MLP and CNNDataSet Classifier Feature No Parameters

SVM All C: 1.0, gamma: 40, kernel: rbf

MLP All activation: relu, batch_size: 32, solver:adam

RF All max_depth: 20, n estimators: 100

CNN All Act: Relu, Optimizer: Adam, Learning_Rate:0.001

DLP SVM Reduced C: 0.1, gamma: 40, kernel: poly

MLP Reduced activation: relu, ’batch_size: 32, solver: adam

RF Reduced max_depth: 30, n_estimators: 50

CNN Reduced Act: Relu, Optimizer: Adam, Learning Rate:0.001


MLP All activation: relu, batch_size: 32, solver: adam


CNN All Act: Relu, Optimizer: Adam, Learning Rate:0.001

DLUP SVM Reduced C: 0.01, gamma: 10, kernel: poly

MLP Reduced activation:relu, batch_size: 32, solver: adam




MLP All activation: logistic, batch_size: 32, solver: sgd


CNN All Act: Relu, Optimizer: Adam, Learning_Rate:0.001

DDP SVM Reduced C: 0.5, gamma: 10, kernel: rbf

MLP Reduced activation: tanh, batch_size: 32, solver: adam



SVM All C: 0.5, gamma: 1, kernel: sigmoid

MLP All activation: tanh, batch_size: 32, solver: sgd


CNN All Act: Relu, Optimizer: Adam, Learning Rate:0.001

DDUP SVM Reduced C: 0.5, gamma: 1, kernel: rbf

MLP Reduced activation: logistic, batch_size: 32, solver: adam



Given the above results, a comparison between ML analysis and the state-of-the-art template attack results (based on the perceived information (PI)) onECC RNS SM implementation, can be made. For template attacks, PI utilizespractical leakages to estimate the Probability Density Function (PDF) of the al-gorithm 1 implementation. Steps explained in (20), are followed to estimate thePI of RNS implementation leakages from BeagleBone. First profiling traces arecollected to estimate the leakage model and then PI is estimated for the actualtest leakages from the chip. The leakage model is estimated based on profilingtraces and then PI is estimated for the collected test traces. The estimation andassumption errors are calculated to evaluate the attacking model. It is observedthat machine learning performs better than the template profiling attacks on theECC RNS SM implementation datasets. For template attacks, the classificationsuccess rate for the location-based attacks is 87-99% for unprotected implemen-

21

tation and for implementations with one countermeasures activated. When acombination of countermeasures is used, then this percentage falls to 70-83%.For machine learning analysis the classification accuracy is 95% and 99.5% forprotected (DLP ) and unprotected (DLUP ) RNS-ECC SM implementations, re-spectively. In (44), template attack on RNS-ECC implementation is successfulonly if the specific sample window from each trace is selected for training. How-ever, in machine learning-based side-channel attack, the model trained with thecomplete trace length gives equal or better results. Isolating and selecting thealigned part only for the training phase, might not be an easy task for an attackerthus making the template attack difficult. However, it is more convenient to trainwith the complete raw trace, which implies that machine learning attacks areless complex from an attacker perspective.

4.2 Impact of Feature Engineering

In this section of experimental analysis, advance feature engineering techniques,based on wrapper and filter methods as explained in Sect. 3.5 approach A, areapplied to analyze the impact of feature reduction on the trained model perfor-mance. Fn = 50 features have been selected from the full length (having featuresFm = 3k and Fm = 700 respectively) and reduced length (having varying num-bers of features depending upon the aligned portion) traces, except T-test. ForT-test threshold is set to 0.5 and the resultant 1299 features are selected for fur-ther analysis. Results for SVM trained model on RNS-ECC protected datasets(MLLA) are shown in Fig. 6.


Fig. 6: Performance comparison for MLLA using SVM with feature extrac-tion/selection techniques

The purpose of applying feature engineering techniques is to find the optimalnumbers of features for the bias-variance tradeoff. Variance in machine learningis the type of error that occurs due to the model’s sensitivity to small fluctu-ations in the training dataset. High variance leads to over-fitting as the modelmight learn from the noise in the data. Bias, on the other hand, is the type oferror that occurs due to erroneous assumptions in the learning algorithm. High

22

bias leads to under-fitting as a model might miss relevant information betweenfeatures and the target key class. Both the errors are inter-linked, minimizingone error will increase the other one. Neural nets (high capacity models) canlead to high variance problems as they might learn from the noise in the data.Regularization, early stopping, and drop-out has been used to avoid the problemin our evaluation. For RF, pruning deals with the above issues, so feature engi-neering is not required. However, for SVM finding an optimal number of featureswill improve the model’s accuracy.

In the case of RNS-ECC datasets, there is a higher bias than a variance.When PCA is applied, the variance is increased thus bias is reduced. Usually,the variance is increased to a level so that the model doesn’t overfit. The suit-able variance threshold (with classification accuracy 100%) is achieved when anumber of features are selected to be Fm = 50 for PCA. For case a, model per-formance stays same or has improved by using Ttest, RF-Imp, PCA and LDA.For case b, improvement is observed for RF-Imp and PCA. However, perfor-mance decreases when analysis is performed after reducing features using LDA.LDA uses classifier and fails to extract the relevant features as some of the in-formation, required to identify the relationship between the target class and thefeature dataset, is lost while the traces are trimmed during alignment process.

4.3 Hybrid Feature Selection Techniques

In this section, comparative analysis is performed, based on the evaluation resultsof the hybrid approaches of the proposed methodology on MLLA, as explainedin Sect. 3.5 approach B and C. For all hybrid methods, feature selection filtermethods have been applied to reduce the bias in the input data by selecting theindependent fn = 300 features from the complete pool of the features fm = 3k(MLLA) and fm = 700 (MLDA) and then only fo = 50 features are selectedfrom the reduced pool of features using extraction techniques, for both case aand case b.

For case a ( Fig. 7a), T-test gives best results using approach A and B.Generally, the trend is seen that the combination of feature selection using filtermethod with the recursive feature elimination, reduces the model accuracy. Oneof the reasons could be that features are highly correlated with each other ratherthan with the target class. Approach 2 with PCA returns the accuracy greaterthan 80%. For Ftest, MI, and Chi2, there is an increase of 13-30% in the resultantaccuracy using hybrid approach C. For case b, some of hybrid methods haveshown improvement in accuracy as compared to the Fig. 4b.

4.4 Impact of Data Splitting Size

In this analysis phase, we have performed quantitative analysis, as described inSect. 3.4. For analysis, out of the best performing feature selection techniques(having accuracy greater than 95%), we have chosen one randomly (i.e. PCAon protected dataset DDP ) to further investigate the impact of varying data

23


Fig. 7: Performance comparison of hybrid feature processing approaches

splitting ratios for RNS ECC Dataset. It can be seen that the best results areobtained with data splitting ratio of 90:10 for training and testing data.

(a) Trace Dataset with all samples (b) Trace Dataset with aligned samplesonly

Fig. 8: Impact of Data Splitting Size on Model Accuracy

In (9), for symmetric ciphers, in total 60,000 instances are used for train-ing and testing, out of which 50,000 are for training and 10,000 are for testing.Expectantly, the huge set of traces is ideal for training with deep learning algo-rithms like CNN. However, the required training time in this case will be hightoo. In this study, we have evaluated the effect of having a small number of tracesuseful of key retrieval. We have seen that location dependent attack is successfulin recovering the key with few traces in less time using validation folds as lowas 3.

5 Conclusion

In this paper, we have presented the evaluation methodology of machine learning-based side-channel attacks on an elliptic curve RNS-based scalar multiplier im-plementation with and without RNS and traditional SCA countermeasures. Each

24

stage of the methodology was described along with a practical experimental re-alization. A detailed analysis of the ECC RNS SM implementation proposedmethodology results was also provided in four different phases of analysis. Com-parison has been provided with the state-of-the-art template attacks on theRNS-ECC balanced and imbalanced datasets. It can be concluded that the ma-chine learning-based side-channel attacks require less prepossessing and give bet-ter performance results for location-based profiling attacks, hence, leading to atime-efficient realistic attack scenario. The secret key can be recovered from un-protected and protected RNS ECC SM implementations, using location-basedattack, with 99% and 95% accuracy, respectively.

The impact of advance feature engineering techniques has been analyzedusing feature extraction and feature selection methods. Moreover, several hybridapproaches were also evaluated. It has been observed that PCA, LDA, T-test,RF-based feature selection provides improved accuracy results.

We have also evaluated the effect of training the model with the small dataset,that is dataset containing reduced aligned samples only, to classify RNS-ECC keybits using machine-learning based side-channel attacks. We have observed thatfor location based attacks, SVM and RF can successfully distinguish the scalarkey bit with more than 95% accuracy for both full length and reduced lengthaligned trace datasets. Trace sample window does not affect the classificationresults using SVM and RF, due to their inherent characteristics of eliminatingredundant features during the training process. However, MLP can distinguishand classify the scalar key bit correctly only if the full trace length dataset isused. If the reduced trace, based on the aligned part, is used for training anMLP network, then some useful information is lost during alignment processand the model fails to classify the scalar key bit. This reduces the complexityof the attack and increase the attack success rate in real world scenario. RNS-ECC implementations showed resistance against Machine-learning based datadependent attacks.

Machine-learning based side-channel attacks on PKC provide a realistic effi-cient attack scenario to recover the secret information as they require less pre-processing compared to template attacks on RNS ECC implementations.

25

Bibliography

[1] Inspector SCA tool. URL: hhttps://www.riscure.com/security-tools/978inspector-sca/. Accessed: 2017-12-14.

[2] National Computational Infrastructure Australia. URL:https://nci.org.au/our-services/supercomputing.

[3] Christos Andrikos, Lejla Batina, Lukasz Chmielewski, Liran Lerman, Vasil-ios Mavroudis, Kostas Papagiannopoulos, Guilherme Perin, Giorgos Ras-sias, and Alberto Sonnino. Location, location, location: Revisiting mod-eling and exploitation for location-based side channel leakages. In Ad-vances in Cryptology - ASIACRYPT 2019 - 25th International Confer-ence on the Theory and Application of Cryptology and Information Se-curity, Kobe, Japan, December 8-12, 2019, Proceedings, Part III, volume11923 of Lecture Notes in Computer Science, pages 285–314. Springer,2019. https://doi.org/10.1007/978-3-030-34618-8 10 doi:10.1007/978-3-030-34618-8 10.

[4] Jean-Claude Bajard, Sylvain Duquesne, and Nicolas Meloni. CombiningMontgomery Ladder for Elliptic Curves defined over Fp and RNS Repre-sentation. In Research Report 06041, 2006.

[5] Jean-Claude Bajard, Julien Eynard, and Filippo Gandino. FaultDetection in RNS Montgomery Modular Multiplication. In IEEE21st Symp. on Comp. Arithmetic, pages 119–126. IEEE, April 2013.https://doi.org/10.1109/ARITH.2013.31 doi:10.1109/ARITH.2013.31.

[6] Jean-Claude Bajard, Laurent Imbert, Pierre-Yvan Liardet, and YannickTeglia. Leak Resistant Arithmetic. In Marc Joye and Jean-JacquesQuisquater, editors, Cryptographic Hardware and Embedded Systems -CHES, Lecture Notes in Computer Science, pages 62–75, 2004.

[7] Lejla Batina, Jip Hogenboom, and Jasper G. J. van Woudenberg. Gettingmore from PCA: first results of using principal component analysis for ex-tensive power analysis. In Orr Dunkelman, editor, Topics in Cryptology -CT-RSA 2012 - The Cryptographers’ Track at the RSA Conference 2012,San Francisco, CA, USA, February 27 - March 2, 2012. Proceedings, vol-ume 7178 of Lecture Notes in Computer Science, pages 383–397. Springer,2012. URL: https://doi.org/10.1007/978-3-642-27954-6 24.

[8] Peter Belhumeur, Joao Hespanha, and David Kriegman. Eigen-faces vs. fisherfaces: Recognition using class specific linear projec-tion. IEEE Trans. Pattern Anal. Mach. Intell., 19:711–720, 07 1997.https://doi.org/10.1109/34.598228 doi:10.1109/34.598228.

[9] Ryad Benadjila, Emmanuel Prouff, Remi Strullu, Eleonora Cagli, andCecile Dumas. Deep learning for side-channel analysis and introduc-tion to ASCAD database. Journal of Cryptographic Engineering, 112019. https://doi.org/10.1007/s13389-019-00220-8 doi:10.1007/s13389-019-00220-8.

[10] Avrim L. Blum and Pat Langley. Selection of relevant features and ex-amples in machine learning. Artif. Intell., 97(1–2):245–271, December1997. https://doi.org/10.1016/S0004-3702(97)00063-5 doi:10.1016/S0004-3702(97)00063-5.

[11] Leo Breiman. Random forests. Mach. Learn., 45(1):5–32, October 2001.https://doi.org/10.1023/A:1010933404324 doi:10.1023/A:1010933404324.

[12] Eleonora Cagli, Cecile Dumas, and Emmanuel Prouff. Convolutional Neu-ral Networks with Data Augmentation Against Jitter-Based Countermea-sures. pages 45–68, 08 2017. https://doi.org/10.1007/978-3-319-66787-43doi : 10.1007/978− 3− 319− 66787− 43.

[13] Mathieu Carbone, Vincent Conin, Marie-Angela Cornelie, FrancoisDassance, Guillaume Dufresne, Cecile Dumas, Emmanuel Prouff,and Alexandre Venelli. Deep Learning to Evaluate SecureRSA Implementations. IACR Transactions on CryptographicHardware and Embedded Systems, 2019, Issue 2:132–161, 2019.URL: https://tches.iacr.org/index.php/TCHES/article/view/7388,https://doi.org/10.13154/tches.v2019.i2.132-161doi:10.13154/tches.v2019.i2.132-161.

[14] Suresh Chari, Josyula R. Rao, and Pankaj Rohatgi. Template attacks.In Burton S. Kaliski Jr., Cetin Kaya Koc, and Christof Paar, editors,Cryptographic Hardware and Embedded Systems - CHES 2002, 4th Inter-national Workshop, Redwood Shores, CA, USA, August 13-15, 2002, Re-vised Papers, volume 2523 of Lecture Notes in Computer Science, pages 13–28. Springer, 2002. https://doi.org/10.1007/3-540-36400-5 3 doi:10.1007/3-540-36400-5 3.

[15] Nitesh Chawla, Kevin Bowyer, Lawrence Hall, and W. Kegelmeyer. Smote:Synthetic minority over-sampling technique. J. Artif. Intell. Res. (JAIR),16:321–357, 01 2002. https://doi.org/10.1613/jair.953 doi:10.1613/jair.953.

[16] Francois Chollet et al. Keras. https://keras.io, 2015.

[17] Mathieu Ciet, Michael Neve, Eric Peeters, and Jean-Jacques Quisquater.Parallel FPGA implementation of RSA with residue number systems - CanSide-channel threats be avoided?, booktitle = Extended version, CryptologyePrint Archive, Report 2004/187, year = 2004,.

[18] Corinna Cortes and Vladimir Vapnik. Support-vector net-works. Mach. Learn., 20(3):273–297, September 1995.https://doi.org/10.1023/A:1022627411411 doi:10.1023/A:1022627411411.

[19] E. De Mulder, S. B. Ors, B. Preneel, and I. Verbauwhede. Differen-tial Power and Electromagnetic Attacks on a FPGA Implementation ofElliptic Curve Cryptosystems. Comput. Electr. Eng., 33(5–6):367–382,September 2007. https://doi.org/10.1016/j.compeleceng.2007.05.009doi:10.1016/j.compeleceng.2007.05.009.

[20] Franccedil;ois Durvaux, Francois-Xavier Standaert, and Nico-las Veyrat-Charvillon. How to certify the leakage of achip? In EUROCRYPT, pages 459–476. Springer, 2014.https://www.iacr.org/archive/eurocrypt2014/84410138/84410138.pdf.

27

[21] Apostolos P. Fournaris. RNS LRA EC scalar Multiplier, 2018.https://github.com/afournaris/RNS LRA EC Scalar Multiplier.

[22] Apostolos P. Fournaris, Nicolaos Klaoudatos, Nicolas Sklavos, and ChristosKoulamas. Fault and Power Analysis Attack Resistant RNS based Ed-wards Curve Point Multiplication. In Proceedings of the 2nd Workshop onCryptography and Security in Computing Systems, CS2 at HiPEAC 2015,Amsterdam, Netherlands, January 19-21, 2015, pages 43–46, 2015.

[23] Apostolos P. Fournaris, Louiza Papachristodoulou, Lejla Batina,and Nicolaos Sklavos. Residue Number System as a side chan-nel and fault injection attack countermeasure in elliptic curvecryptography. In 2016 International Conference on Design andTechnology of Integrated Systems in Nanoscale Era (DTIS), pages1–4, April 2016. https://doi.org/10.1109/DTIS.2016.7483807doi:10.1109/DTIS.2016.7483807.

[24] Apostolos P. Fournaris, Louiza Papachristodoulou, and Nicolas Sklavos.Secure and Efficient RNS Software Implementation for Elliptic CurveCryptography. In 2017 IEEE Eur. Symp. Secur. Priv. Work., pages 86–93. IEEE, apr 2017. URL: http://ieeexplore.ieee.org/document/7966976/,https://doi.org/10.1109/EuroSPW.2017.56 doi:10.1109/EuroSPW.2017.56.

[25] Daniel Genkin, Adi Shamir, and Eran Tromer. RSA Key Extraction viaLow-Bandwidth Acoustic Cryptanalysis. In Juan A. Garay and RosarioGennaro, editors, Advances in Cryptology – CRYPTO 2014, pages 444–461,Berlin, Heidelberg, 2014. Springer Berlin Heidelberg.

[26] Josh Jaffe Gilbert Goodwill, Benjamin Jun and Pankaj Rohatgi. A testingmethodology for side channel resistance validation. NIST noninvasive attacktesting workshop, 2011.

[27] Richard Gilmore, Neil Hanley, and Maire O’Neill. Neural Net-work Based Attack on a Masked Implementation of AES. In 2015IEEE International Symposium on Hardware Oriented Security andTrust (HOST), pages 106–11. Institute of Electrical and ElectronicsEngineers (IEEE), 5 2015. https://doi.org/10.1109/HST.2015.7140247doi:10.1109/HST.2015.7140247.

[28] Anupam Golder, Debayan Das, Josef Danial, Santosh Ghosh, Shreyas Sen,and Arijit Raychowdhury. Practical approaches toward deep-learning-basedcross-device power side-channel attack. IEEE Transactions on Very LargeScale Integration (VLSI) Systems, 27:2720–2733, 2019.

[29] Perin Guilherme, Laurent Imbert, Lionel Torres, and PhilippeMaurine. Attacking randomized exponentiations using unsuper-vised learning. 04 2014. https://doi.org/10.1007/978-3-319-10175-011doi : 10.1007/978− 3− 319− 10175− 011.

[30] Nicolas Guillermin. A Coprocessor for Secure and High Speed ModularArithmetic. IACR Cryptology ePrint Archive, 2011.

[31] Gabriel Hospodar, Benedikt Gierlichs, Elke De Mulder, Ingrid Ver-bauwhede, and Joos Vandewalle. Machine learning in side-channel analysis:a first study. Journal of Cryptographic Engineering, 1(4):293, Oct 2011.https://doi.org/10.1007/s13389-011-0023-x doi:10.1007/s13389-011-0023-x.

28

[32] Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani. AnIntroduction to Statistical Learning: With Applications in R. 08 2013.

[33] George H. John, Ron Kohavi, and Karl Pfleger. Irrelevant features andthe subset selection problem. In Proceedings of the Eleventh InternationalConference on International Conference on Machine Learning, ICML’94,page 121–129, San Francisco, CA, USA, 1994. Morgan Kaufmann PublishersInc.

[34] Ian Jolliffe. Principal Component Analysis, pages 1094–1096. SpringerBerlin Heidelberg, Berlin, Heidelberg, 2011. https://doi.org/10.1007/978-3-642-04898-2455doi : 10.1007/978− 3− 642− 04898− 2455.

[35] Marc Joye and Sung-Ming Yen. The montgomery powering ladder. In Bur-ton S. Kaliski Jr., Cetin Kaya Koc, and Christof Paar, editors, CryptographicHardware and Embedded Systems - CHES 2002, 4th International Work-shop, Redwood Shores, CA, USA, August 13-15, 2002, Revised Papers, vol-ume 2523 of Lecture Notes in Computer Science, pages 291–302. Springer,2002. https://doi.org/10.1007/3-540-36400-5 22 doi:10.1007/3-540-36400-5 22.

[36] Jaehun Kim, Stjepan Picek, Annelie Heuser, Shivam Bhasin, and AlanHanjalic. Make Some Noise. Unleashing the Power of ConvolutionalNeural Networks for Profiled Side-channel Analysis. IACR Transactionson Cryptographic Hardware and Embedded Systems, 2019(3):148–179, May2019. URL: https://tches.iacr.org/index.php/TCHES/article/view/8292,https://doi.org/10.13154/tches.v2019.i3.148-179doi:10.13154/tches.v2019.i3.148-179.

[37] Paul Kocher, Joshua Jaffe, and Benjamin Jun. Differential Power Analysis.In Michael Wiener, editor, Advances in Cryptology — CRYPTO’ 99, pages388–397, Berlin, Heidelberg, 1999. Springer Berlin Heidelberg.

[38] Ron Kohavi and George H. John. Wrappers for feature sub-set selection. Artif. Intell., 97(1–2):273–324, December 1997.https://doi.org/10.1016/S0004-3702(97)00043-X doi:10.1016/S0004-3702(97)00043-X.

[39] Yann LeCun, Patrick Haffner, Leon Bottou, and Yoshua Bengio. Objectrecognition with gradient-based learning. In Shape, Contour and Groupingin Computer Vision, page 319, Berlin, Heidelberg, 1999. Springer-Verlag.

[40] Houssem Maghrebi, Thibault Portigliatti, and Emmanuel Prouff.Breaking cryptographic implementations using deep learning tech-niques. IACR Cryptology ePrint Archive, 2016:921, 2016. URL:http://eprint.iacr.org/2016/921.

[41] Olivier Markowitch, Liran Lerman, and Gianluca Bontempi. Side channelattack: An approach based on machine learning. In Constructive Side-Channel Analysis and Secure Design, COSADE, 2011.

[42] Paulo Martins and Leonel Sousa. The role of non-positional arithmetic onefficient emerging cryptographic algorithms. IEEE Access, 8:59533–59549,2020.

[43] Naila Mukhtar, Ali Mehrabi, Yinan Kong, and Ashiq Anjum.Machine-learning-based side-channel evaluation of elliptic-curve

29

cryptographic fpga processor. Applied Sciences, 9:64, 12 2018.https://doi.org/10.3390/app9010064 doi:10.3390/app9010064.

[44] Louiza Papachristodoulou, Apostolos P. Fournaris, Kostas Papagiannopou-los, and Lejla Batina. Practical evaluation of protected residue num-ber system scalar multiplication. IACR Transactions on CryptographicHardware and Embedded Systems, 2019(1):259–282, Nov. 2018. URL:https://tches.iacr.org/index.php/TCHES/article/view/7341.

[45] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel,M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Pas-sos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research,12:2825–2830, 2011.

[46] Stjepan Picek, Annelie Heuser, Alan Jovic, and Lejla Batina. A SystematicEvaluation of Profiling Through Focused Feature Selection. IEEE Transac-tions on Very Large Scale Integration (VLSI) Systems, 27:2802–2815, 2019.

[47] Stjepan Picek, Annelie Heuser, Alan Jovic, Shivam Bhasin, andFrancesco Regazzoni. The curse of class imbalance and con-flicting metrics with machine learning for side-channel evalua-tions. IACR Trans. Cryptogr. Hardw. Embed. Syst., 2019(1):209–237, 2019. https://doi.org/10.13154/tches.v2019.i1.209-237doi:10.13154/tches.v2019.i1.209-237.

[48] Dr. Dimitrios Schinianakis, Apostolos Fournaris, Harris Michail, Athana-sios Kakarountas, and Thanos Stouraitis. An rns implementation of anelliptic curve point multiplier. Circuits and Systems I: Regular Papers,IEEE Transactions on, 56:1202 – 1213, 07 2009.

[49] Jurgen Schmidhuber. Deep learning in neural networks. Neural Netw.,61(C):85–117, January 2015. https://doi.org/10.1016/j.neunet.2014.09.003doi:10.1016/j.neunet.2014.09.003.

[50] D. L. Swets and J. J. Weng. Using discriminant eigenfeatures for imageretrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence,18(8):831–836, 1996.

[51] Leo Weissbart, Stjepan Picek, and Lejla Batina. One Trace Is All ItTakes: Machine Learning-Based Side-Channel Attack on EdDSA. In ShivamBhasin, Avi Mendelson, and Mridul Nandi, editors, Security, Privacy, andApplied Cryptography Engineering - 9th International Conference, SPACE2019, Gandhinagar, India, December 3-7, 2019, Proceedings, volume 11947of Lecture Notes in Computer Science, pages 86–105. Springer, 2019.https://doi.org/10.1007/978-3-030-35869-3 8 doi:10.1007/978-3-030-35869-3 8.

[52] David Wolpert. The supervised learning no-free-lunch the-orems. 01 2001. https://doi.org/10.1007/978-1-4471-0123-93doi : 10.1007/978− 1− 4471− 0123− 93.

[53] Gabriel Zaid, Lilian Bossuet, Amaury Habrard, and Alexan-dre Venelli. Methodology for Efficient CNN Architecturesin Profiling Attacks. IACR Transactions on CryptographicHardware and Embedded Systems, 2020(1):1–36, Nov. 2019.

30

URL: https://tches.iacr.org/index.php/TCHES/article/view/8391,https://doi.org/10.13154/tches.v2020.i1.1-36 doi:10.13154/tches.v2020.i1.1-36.

[54] Z. Zeng, D. Gu, J. Liu, and Z. Guo. An Improved Side-Channel AttackBased on Support Vector Machine. In 2014 Tenth International Confer-ence on Computational Intelligence and Security, pages 676–680, Nov 2014.https://doi.org/10.1109/CIS.2014.80 doi:10.1109/CIS.2014.80.

31

Machine-Learning assisted Side-Channel Attacks on RNS ... · Machine-Learning assisted Side-Channel...

Documents

Transcript of Machine-Learning assisted Side-Channel Attacks on RNS ... · Machine-Learning assisted Side-Channel...