ARMOUR Deliverable D4.1-FINALV3 - copie · 3! RevisionHistory! Revision! Date! Description!...

Deliverable D4.1

Definition of the large-scale IoT Security & Trust benchmarking methodology

Version Version 1.1

Lead Partner OdinS

Date 12/04/2017

Project Name ARMOUR – Large-Scale Experiments of IoT Security Trust

2

Call Identifier H2020-ICT-2015

Topic ICT-12-2015 Integrating experiments and facilities in FIRE+

Project Reference 688237

Type of Action RIA – Research and Innovation Action

Start date of the project February 1st, 2016

Duration 24 Months

Dissemination Level X PU Public CO Confidential, restricted under conditions set out in Model Grant Agreement CI Classified, information as referred to in Commission Decision 2001/844/EC

Abstract D4.1 aims to provide the initial design of the benchmarking methodology to empower testers and experiments with the ability to assess security solutions for large-scale IoT deployments. The deliverable provides a description of such methodology, which is built as a unified approach on top of different technologies and approaches for security testing, and benchmarking in the IoT landscape. The proposed methodology is intended to be used for the different experiments that are proposed in the scope of the project for assessing the fulfilment of several security aspects. Furthermore, the results from the application of this methodology to ARMOUR experiments will be made available via the FIESTA testbed facilities. The proposed methodology is expected to be used as a baseline to build a new security certification and labelling approach for IoT devices.

Disclaimer This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 688237, but this document only reflects the consortium’s view. The European Commission is not responsible for any use that may be made of the information it contains.

3

Revision History

Revision Date Description Organisation

0.1 04/07/2016 Creation. First outline proposal OdinS

0.2 16/12/2016

ToC update. Content addition to Section 2

OdinS

0.3 17/01/2017

Section 2 updated Content addition to Sections 1, 3 and 4

OdinS, JRC, Smartesting

0.4 26/01/2017

Sections 3 and 4 updated Revision of the document for internal review.

OdinS

1.0 30/01/2017

Proof-reading and minor corrections

UPMC

1.1 12/04/2017

Minor corrections following the feedback from the Review OdinS, UPMC

4

Table of Contents 1 Introduction ................................................................................................................... 6

2 Security Benchmarking and Certification in IoT ............................................................ 7

2.1 Existing approaches ............................................................................................... 7

2.1.1 CSPN .............................................................................................................. 7

2.1.2 UL 2900 .......................................................................................................... 8

2.1.3 CPA ................................................................................................................. 9

2.1.4 CSA STAR ...................................................................................................... 9

2.1.5 ECSO WG1: Basic framework for cybersecurity assessments .................... 10

2.1.6 Smart-Grid Task Force Stakeholder Forum. WP4. Best available techniques (BAT) of smart metering .............................................................................................. 11

2.2 Common Criteria .................................................................................................. 13

2.3. Towards a benchmarking and certification methodology for security in large-scale IoT scenarios .................................................................................................................. 16

3 ARMOUR methodology for benckmarking security and privacy in IoT ....................... 18

3.1 Benchmarking for security and trust in IoT .......................................................... 18

3.2 Approach overview .............................................................................................. 19

3.3 Experiment design ............................................................................................... 20

3.4 Test design .......................................................................................................... 20

3.5 Test generation .................................................................................................... 22

3.6 Test execution ...................................................................................................... 23

3.7 Labelling ............................................................................................................... 24

4 Applying ARMOUR benchmarking methodology to concrete experiments ................. 28

4.1 EXP1 Experiment design ..................................................................................... 28

4.2 EXP1 Test design ................................................................................................ 28

4.3 EXP1 Test generation .......................................................................................... 30

4.4 EXP1 Test execution ........................................................................................... 31

5

4.4.1 Scalability ...................................................................................................... 32

4.5 EXP1 Labelling .................................................................................................... 33

4.5.1 Example of benchmarking: Confidentiality .................................................... 36

5 Conclusions ................................................................................................................. 39

References ......................................................................................................................... 40

6

1 Introduction The IoT landscape of solutions and technologies still provides a fragmented and disharmonized view, which must evolve to a more unified approach in order to ensure interoperability of large-scale IoT scenarios. This issue is exacerbated when we move to security and solutions since they involve companies, public administrations, regulatory bodies and citizens. These stakeholders have different needs and requirements that must be reconciled to guarantee the deployment of trustable IoT infrastructures

In this context, benchmarking security and trust in large-scale IoT deployments is a key aspect to give experimenters and testers the ability to assess and compare different IoT security solutions. This is required to design interoperable and flexible security and privacy solutions able to satisfy IoT stakeholders’ requirements. This deliverable is intended to provide an initial definition of the main stages for ARMOUR benchmarking methodology. Such methodology is built on top of concepts and technologies of existing approaches for benchmarking in different Information and Communication Technology (ICT) domains (see Section 2), in order to build a novel certification scheme for security aspects in IoT devices. The proposed benchmarking methodology (Section 3) is additionally based on different security dimensions including several security attacks and properties to be tested and compared. ARMOUR benchmarking methodology is intented to be used by the certification approach (as part of the WP4). Furthermore, the application of the methodology to a specific experiment is also provided in Section 4.

This is the first deliverable of Task 4.1. The following table shows how this deliverable addresses the different elements of Task 4.1.

Elements of Task 4.1 How this deliverable address the specific elements of Task 4.1

This task will deal with the establishment of the ARMOUR benchmarking methodology for measuring performance (and then comparing) Large-Scale IoT Security & Trust solutions. Several dimensions will be considered including security attacks detection, defence against attacks and misbehaviour, ability to use trusted sources and channels, levels of security & trust conformity, etc. Benchmark results of the Large-Scale IoT Security & Trust solutions experiments will be performed and datasets will be made available via the FIESTA testbed.

Section 2 provides a description of existing approaches for benchmarking and certification.

Section 3 presents the ARMOUR benchmarking methodology.

Section 4 introduces the application of such methodology to a specific experiment of the projects.

7

2 Security Benchmarking and Certification in IoT Nowadays, security aspects represent one of the most significant barriers for the adoption of large-scale IoT deployments [1]. Manufacturers of IoT devices are working together with standardization bodies to build the next generation of more secure and standardized smart objects, but certification of security aspects remains as an open issue. Security threats are increasing due the ubiquitous nature of the next digital era, transforming these aspects into a major concern for companies, governments and regulatory bodies. In this sense, a suitable certification scheme [2] would help to assess and compare different security technologies, in order to provide a more harmonized IoT security view to be leveraged by end consumers.

As part of this certification process, being able to benchmark [3] different IoT security approaches is crucial to quantify their security level. In this section, different benchmarking and certification approaches are discussed. These approaches are intended to be used as the baseline for the definition of the ARMOUR benchmarking methodology.

2.1 Existing approaches

2.1.1 CSPN The Certification de Sécurité de Premier Niveau (CSPN) [4] is a certification scheme created by the National Cybersecurity Agency of France (ANSSI) in 2008. The main goal of the CSPN certification is to verify that a specific product meets its security requirements. This goal represents the union of three main CSPN objectives. Firstly, the verification of the product’s compliance with its security specifications. Secondly, the assessment of the mechanisms and known vulnerabilities of similar products. Thirdly, the tests of the product trying to break its security functionality. This kind of evaluation is conducted under the terms of the conformity analysis (verifying that a product complies its security specifications, seeing a product as a whole to perform relevant analyses) and analysis of efficiency (rating the theoretical strength of the security functions and mechanisms, and identifying vulnerabilities).

In CSPN, for a Target Of Evaluation (TOE, from Common Criteria notation1), its security target is analysed, and then, the compliance is tested along different intrusion tests for that TOE. A significant point to be highlighted is CSPN aims at performing evaluations in a short period through the adaptation of the product development lifecycle. This is particularly important to reconcile the time requirements from the IoT market for new products, while security aspects are strongly considered. It should be noted that bodies responsible for CSPN evaluations are licensed by ANSSI, ensuring competence and independence at the same time.

Finally, the main steps of the CSPN approach can be summarized as2:

1. Definition/redaction of a security target precisely describing the scope of the evaluation (when needed, some parts of the product can be excluded from the evaluation). This step is done jointly by developers and evaluator.

2. Validation of the security target by the ANSSI. 1 http://www.commoncriteriaportal.org/cc/ 2 https://www.cryptoexperts.com/services/cspn/

8

3. Developers then hand out all the evaluation material to the evaluator. This evaluation material may or may not include source code. It should at least include a fully functional version of the product and, for the cryptographic analysis, a precise description of the cryptographic protocols implemented.

4. Time constrained evaluation of the product. During this step, further exchanges between the evaluator and the developers are still possible. The developers may hand out additional material to the evaluator, or simply answer the questions he may have. This time-constrained step also includes the redaction of a detailed evaluation report listing all the tests performed during the evaluation and their results, and providing a list of all uncovered security issues.

5. Validation of the conclusions of the report by the ANSSI. 6. Delivery of the CSPN certificate by the ANSSI.

A description of CSPN is provided in [5], where is pointed out that CSPN is complementary to CC rather than an alternative. In fact, CSPN can also be anticipatory to the CC process, by analysing and addressing a TOE coping with vulnerabilities within a constrained time (a typical certification takes around 25 days, which is significantly less than a typical CC certification).

2.1.2 UL 2900 Underwriters Laboratories (UL) is a company providing testing and safety certifications with electrical, building, fire, mechanical and other codes to ensure products sold to consumers and companies are as safe as possible, and follow their UL standards. The company announced in 2016 its Cybersecurity Assurance Program (UL CAP), which uses the UL 2900 standards [6]. This series of standards aims to provide testable cybersecurity criteria to assess vulnerabilities and weaknesses in connected devices. Furthermore, it is intended to provide a set of technical criteria for testing and evaluating the security of different products.

The standard is currently divided into three parts:

• UL 2900-1 – Outline of Investigation for Software Cybersecurity for Network-Connectable Products, Part 1: General Requirements. This outline applies to network-connectable products that shall be evaluated and tested for vulnerabilities, software weaknesses and malware.

• UL 2900-2-1 – Outline of Investigation for Software Cybersecurity for Network-Connectable Products, Part 2-1: Particular Requirements for Network Connectable Components of Healthcare Systems. This security evaluation outline applies to the testing of network connected components of healthcare systems.

• UL2900-2-2 – Outline of Investigation for Software Cybersecurity for Network-Connectable Products, Part 2-2: Particular Requirements for Industrial Control Systems. This security evaluation outline applies to the evaluation of industrial control systems components, such as Process control systems, Control servers, SCADA servers, Remote Terminal Units (RTU) or smart sensors, among others.

The standards were published just over some months ago, which means it is not widely recognized as an accepted certification scheme yet.

9

2.1.3 CPA

Commercial Product Assurance (CPA) represents an approach from the Communications-Electronics Security Group (CESG) in the UK, to increase the level of trust regarding security aspects of commercial products. CPA aims to identify the suitable level of security assurance for a product, and to test and certify security software and hardware within the UK government, in order to ensure the appropriate level of security for sensitive data and information. Furthermore, one of its main objectives is to consolidate previous schemes from CESG, by offering a certificate-based assurance of security products.

Under CPA foundations, a security product that passes assessment is awarded Foundation Grade certificate. This means the product is proven to demonstrate good commercial security practice and is suitable for lower threat environments. CPA certification is valid for two years and allows products to be updated during the lifetime of certification as vulnerabilities and updates are required. Products are tested against published CPA Security Characteristics [7].

However, while it was intended to supplant other approaches such as Common Criteria, there is no Mutual Recognition Agreement (MRA) for CPA. This basically means that products tested in the UK will not normally be accepted in other markets.

2.1.4 CSA STAR The Cloud Security Alliance Security, Trust & Assurance Registry (CSA STAR) [8] Certification is a scheme developed to address specific issues relating to cloud security as an enhancement to ISO/IEC 27001. STAR consists of three levels of assurance (Figure 1). CSA STAR is built on top of two main aspects:

• Cloud Controls Matrix (CCM) - As a controls framework, the CSA CCM provides organizations with the needed structure, detail and clarity relating to information security tailored to cloud computing.

• The Consensus Assessments Initiative Questionnaire (CAIQ) - Based upon the CCM, the CAIQ provides a set of Yes/No questions a cloud consumer and cloud auditor may wish to ask of a cloud provider to ascertain their compliance to the Cloud Controls Matrix and CSA best practices.

10

Figure 1: CSA STAR Levels of assurance

These three levels of assurance are described as [8]:

• Level One: Self-Assessment. It provides documents about the security controls provided by different cloud computing offerings, to help users to assess the security of different cloud providers.

• Level Two o Attestation. It is intended to provide a rigorous third party independent assessment of cloud providers.

o Certification. It pursues the main goal by leveraging the requirements of the ISO/IEC 27001 together the CCM.

o Assessment. It is a robust third party independent assessment of the security of a cloud service provider for the Greater China market that harmonizes CSA best practices with Chinese national standards

• Level Three: Continuous Monitoring. It enables automation of the current security practices of cloud providers

2.1.5 ECSO WG1: Basic framework for cybersecurity assessments

Within the European cybersecurity strategy, one of the main current objectives if the establishment of a platform, in order to provide a unified set of security guidelines to foster the development and adoption of secure ICT products. The convergence between industry and public sector has resulted into the European Cyber Security Organisation (ECSO), which brings together relevant European public and private stakeholders.

The main ECSO tasks are split into 6 Working Groups. Specifically, the WG1: Standardisation, certification, labelling and supply chain management aims to promote convergence through the support of security standards and the assistance with EU-wide certification schemes in specific areas. By considering its different areas of interest, WG1 tasks are grouped within four sub-working groups, addressing the issues of components/products (SWG1), the infrastructure (SWG2) and the organizations (SWG3). A sub-working group for basic requirements (SWG4) is set up to guarantee synchronization between SWGs and lead continuous gap analysis during the activity of ECSO.

11

The main objective of the ECSO WG1 is to construct a basic framework, as much as possible based on existing standards, in order to develop a cybersecurity evaluation and certification system for the benefit of the protection and security of the European citizens. ECSO WG1 aims to accomplish these tasks via a joint and cooperative work towards a unified approach.

Within the WG1, some fundamental elements and guidelines of labelling and certification are already considered as the baseline towards previous objectives. Firstly, labelling and certification will clearly identify the object of the evaluation. This includes the component, infrastructure, service or system under consideration, and a description of its operational environment. Secondly, labelling and certification will define various levels of evaluation, with different assurance requirements per level. In this sense, the proposal is based on the design of levelling strategies, similarly to Common Criteria.

While ECSO WG1 represents a recent effort, it is meant to be key for the definition of a certification and labelling approach for IoT security concerns in Europe.

2.1.6 Smart-Grid Task Force Stakeholder Forum. WP4. Best available techniques (BAT) of smart metering

The Best Available Techniques Reference Document for the cyber-security and privacy of the 10 minimum functional requirements of the Smart Metering Systems3 aims at identifying the most suitable techniques to increase the level of cyber-security and privacy of smart-metering systems with respect to the 10 minimum functional requirements of COM 2012/148/EU. The main objective of WP1 is to define a coherent and reliable evaluation methodology to be used in WP3 to identify the Best Available Techniques related to the 10 minimum functional requirements.

For an objective comparison of each suggested technique, three elements are defined:

• The dimensions to be evaluated • The criteria to be taken into consideration for each dimension • A framework allowing to derive an evaluation among the techniques combining the specific evaluation of each of the identified dimensions and criteria

The objective of the evaluation framework defined is to enable the comparison of specific techniques. To support the comparison, a metric scheme is defined to score a particular technique. For this, the metric is categorized by different dimensions. Furthermore, each dimension consists of specific criteria. The structure of the metric is outlined in the scheme of Figure 2 (left).

For each criterion, the individual technique is assigned points from 0 – 2. These points shall be awarded to rank the efficiency or effectiveness of the technique in for this specific criterion. Assigning “0” points shall represent the lowest score (not effective), “2” shall be the highest score (very effective), “1” represents a moderate effectiveness (applicable, but

3https://ec.europa.eu/energy/sites/ener/files/documents/bat_wp4_bref_smartmetering_systems_final_deliverable.pdf

12

with drawbacks). In some cases, the metrics will only be scored “0” or “2”, since the judgement of a criteria might be only binary.

Hereinafter there is a description of the mathematical composition of the metric scoring mechanism.

The set of criteria in a dimension 𝐷# shall be noted as 𝐶#. The cardinality of the set 𝐶# is noted as |𝐶#|. Scores given for a particular criterion j of the dimension 𝐷# shall be noted as 𝑐#&.

The sum of all criteria scores of a dimension Di shall be noted as 𝑑#:

𝑑# = 𝑐#))

; 𝑐#) ∈ 0,1,2

There may be cases where the evaluation of a given criterion is not appropriate for a technique. If so considered, the decision should be clearly justified. In this case the set 𝐶# will not take into consideration this particular criterion.

All dimensions are equally relevant and thus are weighted equally. Different dimensions might be evaluated through a different number of criteria. There is then a need for a normalisation process. To ensure this, a weight for every dimension is introduced. The weight for a dimension 𝐷# shall be noted as 𝑤#.

For each dimension 𝐷# with the criteria set 𝐶# assigned, the weight 𝑤# of this dimension is defined as the reciprocal of the cardinality of the criteria set.

𝑤# =1|𝐶#|

Figure 2: Structure of the metric for assessing techniques (left) and an evaluation example (right)

13

After the ranking process, the sum of all awarded points per dimension, with the exception of the Financial Impact, represents the overall metric 𝑚5. The metric m1 to score a technique is defined as:

𝑚5 = (𝑤# ∗ 𝑑#))

The 𝑚5 ranking of a technique will allow a decision that a technique provides an efficient solution in a given dimension, but has shortcomings in other dimensions. The metric 𝑚5 is indeed a metric ranking the ability of a technique to mitigate the risk on personal data and security. In Figure 2 (right), we have an example of evaluation.

The metrics to evaluate the techniques are aggregated into the following dimensions:

• Cyber-security: it can be analysed by taking into consideration the following 8 criteria: confidentiality, availability, integrity, access to key material, integrity of key material, authentication, auditing/logging, non-repudiation.

• Privacy and Data Protection: it includes the criteria data retention, data minimization, data control, data access and anonymity.

• Maturity and Upgradeability of Technique: it ranks the technique in respect of its maturity and its ability to be upgraded easily. The criteria are implementation scale, standardisation and upgradability.

• Impact of Technique towards Architecture: it ranks the technique towards the impact of a given architectural design and considered services. The criteria are Communication overhead generated, Bandwidth required, Latency tolerance / “Always-on communication required?” and Impact to processes.

2.2 Common Criteria

The Common Criteria (CC) [9] permits comparability between the results of independent security evaluations. The CC does so by providing a common set of requirements for the security functionality of IT products and for assurance measures applied to these IT products during a security evaluation. The evaluation process establishes a level of confidence that the security functionality of these IT products and the assurance measures applied to these IT products meet these requirements. The evaluation results may help consumers to determine whether these IT products fulfil their security needs.

The CC is flexible in what to evaluate and therefore is not tied to the boundaries of IT products as commonly understood. Therefore, in the context of evaluation, the CC uses the term Target of Evaluation (TOE). A TOE is defined as a set of software, firmware and/or hardware possibly accompanied by guidance.

It gives consumers, especially in consumer groups and communities of interest, an implementation-independent structure, termed the Protection Profile (PP), in which to express their security requirements in an unambiguous manner. These requirements are contained in an implementation-dependent construct termed the Security Target (ST). This ST may be based on one or more PPs to show that the ST conforms to the security requirements from consumers as laid down in those PPs. Whereas an ST always describes a specific TOE (e.g. the MinuteGap v18.5 Firewall), a PP is intended to describe a TOE type (e.g. firewalls). The same PP may therefore be used as a template for many different STs

14

to be used in different evaluations. Common Criteria also defines different levels of evaluation called Evaluation Assurance Levels (EAL) from 1 to 7, defined in Figure 3.

EAL

level Description

1

Functionally Tested. Provides analysis of the security functions, using a functional and interface specification of the TOE, to understand the security behaviour. The analysis is supported by independent testing of the security functions.

2

Structurally Tested. Analysis of the security functions using a functional and interface specification and the high-level design of the subsystems of the TOE. Independent testing of the security functions, evidence of developer "black box" testing, and evidence of a development search for obvious vulnerabilities.

3

Methodically Tested and Checked. The analysis is supported by "grey box" testing, selective independent confirmation of the developer test results, and evidence of a developer search for obvious vulnerabilities. Development environment controls and TOE configuration management are also required

4

Methodically Designed, Tested and Reviewed. Analysis is supported by the low-level design of the modules of the TOE, and a subset of the implementation. Testing is supported by an independent search for obvious vulnerabilities. Development controls are supported by a life-cycle model, identification of tools, and automated configuration management.

5

Semi-formally Designed and Tested. Analysis includes all of the implementation. Assurance is supplemented by a formal model and a semiformal presentation of the functional specification and high level design, and a semiformal demonstration of correspondence. The search for vulnerabilities must ensure relative resistance to penetration attack. Covert channel analysis and modular design are also required.

6

Semi-formally Verified Design and Tested. Analysis is supported by a modular and layered approach to design, and a structured presentation of the implementation. The independent search for vulnerabilities must ensure high resistance to penetration attack. The search for covert channels must be systematic. Development environment and configuration management controls are further strengthened.

7

Formally Verified Design and Tested. The formal model is supplemented by a formal presentation of the functional specification and high level design showing correspondence. Evidence of developer "white box" testing and complete independent confirmation of developer test results are required. Complexity of the design must be minimised.

Figure 3: Definition of EALs from Common Criteria

The CC recognises two types of evaluation: an ST/TOE evaluation and an evaluation of PPs. The ST evaluation is carried out by applying the Security Target evaluation criteria to the Security Target. The precise method to apply the ASE criteria is determined by the evaluation methodology that is used. The TOE evaluation is more complex. The principal inputs to a TOE evaluation are: the evaluation evidence, which includes the TOE and ST,

15

but will usually also include input from the development environment, such as design documents or developer test results. The TOE evaluation consists of applying the Security assurance requirements (SARs, from the Security Target) to the evaluation evidence. The precise method to apply a specific SAR is determined by the evaluation methodology that is used. The evaluation results also identify the PP(s) to which the TOE claims conformance. Figure 4 portrays the mandatory content for a PP.

Figure 4: Document structure of a protection profile.

A PP is designed to be a baseline defined by a group of IT developers, who then agree that all IT that they produce of this type will meet this baseline and to be a security specification on a relatively high level of abstraction. A PP should, in general, not contain detailed protocol specifications, detailed descriptions of algorithms and/or mechanisms, long description of detailed operations etc.

An important end-user complaint is the evaluation process takes too long while product vendors often say the evaluation costs are too high. Industry changes that bring new vendor technologies, product development approaches, and shorter time to market serve to exacerbate these problems. To address these needed improvements, the CCRA Management Committee (CCMC) announced in September 2012 a shift away from harmonizing the CC/CEM processes among the divergent and growing evaluation schemes to instead focus on development of new-style Protection Profiles called collaborative Protection Profiles or cPPs. cPPs are developed by International Technical Communities or iTCs. cPPs move away from Protection Profiles of the past that were developed without strong engagement and endorsement of all CCRA participant nations.

Building on this, currently there are four Collaborative Protection Profile (cPP), which are PPs that have been created through a collaborative process. The intention is to address the historical stovepipe nature of government specific PPs whilst defining better requirements and testing methodology through industry engagement. These cPPs are Collaborative Protection Profile for Stateful Traffic Filter Firewalls, Collaborative Protection Profile for Full Drive Encryption – Authorization Acquisition, and Encryption Engine and Collaborative Protection Profile for Network Devices. The last one is the most interesting for our purpose,

16

which TOE is a network device. It provides a minimal set of security requirements expected by all network devices that target the mitigation of a set of defined threats. All the security functional requirements specify some evaluation tests focused on the functional requirement and on one specific method of performing it, for example authentication with elliptic curves.

However, CC certificates a particular version of the product in certain configurations. Any changes to the configuration or any updates to the product that affect the Target of Evaluation (TOE), which is the part of the product that is evaluated, invalidate the certification. This is not a desirable situation, given that products evolve and are updated at a frantic pace and the certification must not be frozen to a specific version of the product. Re-certification after changes being made in the product is not mandatory, but should be considered case by case. In addition, it must complement, rather than conflict with, existing safety certification mechanisms. And finally, it fails to deal satisfactorily with systems that are patched frequently, as operating systems now are;; observers of the operating-system patching cycle and vulnerability scene have come to the conclusion that the Common Criteria is no more than a bureaucratic exercise whose costs far outweigh the benefits [10].

There have been other critics to Common criteria, since for example, the effort and time necessary to prepare evaluation evidence and other evaluation-related documentation is so cumbersome that by the time the work is completed, the product in evaluation is generally obsolete. In addition, CC evaluation is a costly process in terms of money but it focuses primarily on assessing the evaluation documentation, not on the actual security, technical correctness or merits of the product itself [5].

2.3. Towards a benchmarking and certification methodology for security in large-scale IoT scenarios

As already mentioned, security aspects represent one of the major barriers for the establishment of large-scale IoT deployments. In this context, having the ability to assess and compare different security approaches or devices in an objective way, is a key aspect to guarantee end users are able to trust in these IoT infrastructures. However, a proper benchmarking and certification methodology for security in IoT must be overcome different obstacles that are inherent to this paradigm. On the one hand, the high degree of diversity and heterogeneity of devices and products is in conflicting with the need for objective comparisons regarding security aspects. On the other hand, due to the dynamism of typical IoT environments, the benchmarking and certification approach must be able to adapt its behaviour according to changing conditions, in which the product will be operating. Towards this end, a clear identification of threats and vulnerabilities is key to guarantee the success of the approach. In addition, the methodology must cope with the business requirements and needs from the IoT market. It means that security certification approaches should be efficient and cheap, so the product launch in the market is not delayed.

In this sense, while CC can be considered as the main candidate for security certification standard, it poses significant limitations that must be taken into account to be applied in the IoT ecosystem. Indeed, some of them have already been described in the previous section, especially regarding the time and cost that are required in certain cases. On the one hand, high assurance level (such as EAL4) certification could take a long time (even 2 years), which can become in an important obstacle for market purposes. On the other hand, IoT

17

devices manufacturers and vendors need to make a significant effort for evaluation and certification tasks of their products, which again, is in conflicting with their business requirements. The dynamic nature of the IoT makes security certification of devices more difficult to be carried out. Typically, an IoT device goes cross different stages during its lifecycle [12], in which it changes and evolves (e.g. due to the application of security patch from the manufacturer). However, CC methodology is intended to certificate a specific version of a product, so any change to the configuration or any updates to the product that affect the TOE, which is the part of the product that is evaluated, may invalidate the certification [13]. Furthermore, the cumbersome specification of the CC approach makes comparison of certified products difficult to be accomplished. This is a significant issue, since in the IoT non-expert users should be able to assess and compare different devices from the security perspective. In order to cope with some of these major needs, the ECSO WG1 has stated a set of roadmap elements for 2017:

1) Evaluation of all existing testing/certification schemes across Europe and globally and to various properties such as product domain applicability, security assurance levels, type of vulnerability assessment, time to market, costs and agility.

2) Benchmarking and identifying relevance of each existing scheme as per the requirements of both the public and private sectors

3) Mapping and developing opportunities for harmonization of existing schemes 4) Developing “best practices” solutions within the sub-areas, moving toward a “harmonized” approach to cyber security within a consensus based environment

5) Working with public sector partners to address mutual recognition of “future” schemes 6) Accomplishing a “fast track” process to achieve actual standards 7) Implementing and Piloting these testing and certification solutions to demonstrate effectiveness and cost efficiency as well as customer acceptance and trust

These points represent some of the major requirements and needs for the adoption of a European benchmarking and certification approach for IoT security. In this sense, one of the main goals from the ARMOUR project is to build a benchmarking methodology for security, in order to provide a connection between testing and labelling, as the baseline for certification. By considering the set of limitations of current certification approaches, the intended certification scheme aims to perform an efficient (in cost and time) labelling approach for the different experiments. On the one hand, benchmark results will be used as the baseline for this certification process, so end users (in addition to testers and experimenters) are able to assess and compare different security solutions, or security aspects in different products. Furthermore, these results will be made available via the FIESTA IoT/Cloud Infrastructure. On the other hand, benchmark results will be obtained from the proposed methodology, which will be described in the next section. It should be noted that such methodology is built by using different technologies and approaches for testing and experimentation. Specifically, from the different experiments already defined within the project, a Model-Based Testing (MBT) [14] approach will be used for modelling these experiments. Then, tests will be executed through the Testing and Test Control Notation (TTCN-3) [15]. These benchmark results will be used for the realization of the labelling and certification approach, which will be defined within WP4.

18

3 ARMOUR methodology for benckmarking security and privacy in IoT

This section is focused on the definition of the main stages composing the ARMOUR benchmarking methodology. Before this description, we include the main motivation and requirements for benchmarking security and privacy in large-scale IoT deployments. 3.1 Benchmarking for security and trust in IoT

Moving from isolated IoT domains to real large-scale deployments requires a better understanding and consensus about actual implications of envisioned infrastructures. For example [16]:

• Is the deployment doing what it is supposed to do? • Is it the best way to achieve the expected goal? • Is the deployment creating unexpected issues? • Is the deployment achieving the expected economic performance?

Providing answers to these questions is of interest to people involved in IoT deployments to identify good practices, avoid traps and make good choices. In this sense, the benchmarking of security and trust aspects is crucial to guarantee the success of large-scale IoT deployments.

Benchmarking IoT deployments should meet different objectives. On the one hand, as we are still in a disharmonized landscape of technologies and protocols, for experimenters and testers, having the whole vision of difficulties and opportunities of IoT deployments is quite difficult. Consequently, there is a need to follow the deployment process, to understand how deployment works, and which security and trust technologies are involved, as well as the impact of a security flaws related to such technologies. On the other hand, and also related with such fragmented landscape of security solutions for the IoT, large-scale deployments will be based on a huge amount of different technologies and protocols. Therefore, the identification of common metrics is crucial to be able to assess and compare such solutions in order to identify good security and trust practices and guidelines. Towards this end, one of the main goals of the projects stems from the need to identify a suitable benchmarking methodology for security and trust in IoT, as a baseline for the certification process within the scope of WP4. This methodology, as shown is next sections is based on the identification of different metrics associated to different functional blocks, in order to benchmark security and privacy on the different experiments designed in the project. Specifically, micro-benchmarks provide useful information to understand the performance of subsystems associated with a smart object. Furthermore, they can be used to identify possible performance bottlenecks at architecture level and allow embedded hardware and software engineers to compare and asses the various design trade-offs associated with component level design. Macro-benchmarks provide statistics relating to a specific function at application level and may consist of multiple component level elements working together to perform an application layer task.

The ARMOUR project is focused on carrying out security testing and certification in large-scale IoT deployments. Between both processes, benchmarking is intended to provide different results from testing to serve as the baseline for the certification scheme. Towards

19

this end, for both micro-benchmarking and macro-benchmarking, this methodology will be based on the identification of different metrics per functional block (e.g. authentication). Benchmarking results will help to increase the trust level on the assurance of security properties in IoT products and solutions. Based on this, the next section provides a description of the ARMOUR benchmarking methodology.

3.2 Approach overview

A high-level overview of the proposed benchmarking methodology is shown in Figure 5.

LabellingTest generation

Test execution

Test design

Experiment design

Experiment scope

Identification of threats and vulnerabilities

Test patterns

Test adapter

Vulnerability Patterns

Experiment description In-‐houseLarge-‐

scale

Labelling scheme

Benchmark results

Test suites

Test models

Test Modelling

Test Preparation

Figure 5: ARMOUR Benchmarking methodology overview

Following the figure, the methodology comprises 5 main stages: 1. Experiment design. A concrete experiment is defined in the context of a specific IoT application scenario or use case.

2. Test design. The description of the experiment is used to identify different threats and vulnerabilities that can be tested over it. These vulnerabilities and threats are identified from a set of already established vulnerability patterns, and used as an input to generate test patterns in order to define the procedure to test a specific threat.

3. Test generation. Based on test patterns, a test model formalizes a specific subset of the functionality of the experiment. Using these models, a set of test suites are generated to be then executed.

4. Test execution. From the set of test suites, a set of test adapters are defined to guarantee suites can be executed under different environments and conditions. Specifically, in the context of the project, these tests are intended to be executed in both in-house scenarios and large-scale setting, through the use of FIT IoT-Lab.

5. Labelling. The results from the execution of these tests are the used during the certification process by making use of a security labelling scheme.

In following sections, a more detailed overview of these stages is provided.

20

3.3 Experiment design

The ARMOUR project considers a large-scale experimentally-driven research approach due to IoT complexity (high dimensionality, multi-level interdependencies and interactions, non-linear highly-dynamic behaviour, etc.). This research approach makes possible to experiment and validate research technological solutions in large-scale conditions and very close to real-life environments. As a baseline for the benchmarking methodology, different experiments have been designed to test different security technologies, protocols and approaches that are intended to be deployed in typical IoT scenarios:

• EXP 1: Bootstrapping and group sharing procedures • EXP 2: Sensor node code hashing • EXP 3: Secured bootstrapping/join for the IoT • EXP 4: Secured OS/Over the air updates • EXP 5: Trust aware wireless sensors networks routing • EXP 6: Secure IoT Service Discovery • EXP 7: Secure IoT platforms

This set of experiments constitutes the basis to obtain benchmarking results from the methodology application to such scenarios. In particular, these experiments have been designed to test specific security properties of technologies and protocols, such as the Datagram Transport Layer Security (DTLS) [17], as the main security binding for the Constrained Application Protocol (CoAP) [18]. These experiments also involve the use of different cryptographic schemes and libraries that will be tested, in order they can be certified according to benchmarking results. By considering different security aspects, during this stage, following a similar approach to the EALs in CC, a risk analysis is performed regarding the assurance level for the environment where the device will operate.

The design and description of such experiments is based on the definition of different required components and testing conditions, to ease the test design, modelling and execution that are defined in the following subsections. In particular, by considering ARMOUR experiments, Section 4 provides the description of the benchmarking methodology application to EXP1.

3.4 Test design

During test design, experiments are set up and prepared through the specification of security test patterns that describe the testing procedures. By considering the set of different experiments, as well as the security protocols and technologies involved in each of them, this stage is focused on the definition of different tests to assess the fulfilment of specific security properties on those experiments. In particular, Figure 6 shows these security properties, as well as a description for each of them. Such properties have been extracted from some of the most referenced security aspects that can be found in current IoT literature [31][32][33][34][35]. These properties can be security properties, such as authentication and integrity, or resistances to certain attacks, such as the MITM and the replay attack.

21

.

Figure 6: Security properties considered.

The proposed methodology is based on the use of these properties and the set of identified vulnerabilities in ARMOUR D1.1 (Section 8.2) [19] that are extracted from oneM2M vulnerabilities [20]. From these properties and the set of vulnerabilities that are described, different tests can be designed in order to prove previous experiments are actually satisfying the set (or a subset) of security properties.

It should be pointed out that the identified security properties are intended to serve as a starting point to build a more generic, stable and holistic approach for security certification in IoT scenarios. This set will evolve during the project to cover other security vulnerabilities, and it will be reflected within next deliverables.

From these security vulnerabilities and properties, security test patterns are identified to define the testing procedure for each of them. In the context of the project, D2.1 [21] already identifies a set of test patterns to be considered for the seven experiments. With all this information, tests patterns can be designed by associating security vulnerabilities and properties. These tests will be defined following the schema of the Figure 7, which includes

••The endpoints should be legitimate.Authentication

••There should be mechanisms to detect duplicate or replayed messages.Resistance to replay attack

••The key used in the communication should not be a dictionary word, or there should be mechanism to protect the key from this type of attacks.Resistance to dictionary attack

••The devices should be protected from DoS attacks. They should have mechanisms to avoid them or to react when they are happening. Resistance to DoS attacks

••Data exchanged between endpoints should not be modifiable. If this does not happen, then any change should be detected.Integrity

••Sensitive transmitted data should be read only by the communication endpoints. Confidentiality

••The endpoints should have mechanisms to detect or avoid a MITM attack. Resistance to MITM attacks

••Services should be accessible to endpoints who have the right to access themAuthorization

••The communication endpoints should always be reached and should not be made inaccessible

Availability

••Overall service should be delivered even when a number of atomic services are faulty

Fault tolerance

••Transmitted data related to the identity of the endpoints should not be sent in clear.

Anonymization

22

a description for each field. The main part of the definition of the test is the test description that must include the steps that we have to follow to make the test and when the test is satisfactory or not.

Test pattern ID The identifier of the test pattern

Stage It makes reference to the specific stage or step of an experiment

Protocol The technology or protocol related to the test pattern

Property tested The security property related to the test pattern

Test diagram A figure with the main components involved in the test pattern

Test description Description of the steps and conditions related to the test pattern

References Vulnerabilities related to this test pattern

Figure 7: Test pattern template associating security vulnerabilities and properties

It should be noted that this template is similar to the approach followed in D2.1. However, the proposed template is intended to identify the association or relationship between the test patterns with the security properties and vulnerabilities. The identification of such relationship aims to foster the understanding of the benchmarking methodology approach towards a security certification scheme for the IoT.

3.5 Test generation

From the test patterns that are designed in the previous stage, during test generation, real tests are defined in order to validate different security properties. For testing purposes, different strategies can be employed;; indeed, software testing is typically the process for verifying different requirements are fulfilled regarding the expected behavior of a System Under Test (SUT). In this sense, software security testing [22] is intended to validate security requirements are satisfied on a specific SUT. According to [23], security testing techniques can be according to their test basis within the secure software development lifecycle into four different types:

• Model-based security testing that is related to the requirements and design models that are created during the analysis and design.

• Code-based testing, which is focused on source and byte code created during development.

• Penetration testing and dynamic analysis on running systems, either in a test or production environment,

• Security regression testing performed during maintenance. For the proposed methodology, ARMOUR is based on the use of the Model-Based Testing (MBT) approach [24] (as the generalization of Model-Based Security Testing (MBST)). MBT is mainly based on the automatic generation of test cases from a SUT, so consequently,

23

MBST aims to validate security properties with a model of the SUT, in order to identify if such properties are fulfilled in the model. Specifically, in the context of the project, the MBT approach is mainly driven by the Smartesting CertifyIt tool [25]. This tool is already used in the scope of enterprise software and different business areas, as well as in the context security testing [26]. CertifyIt allows to model a system through the use of Unified Modelling Language (UML), and expressing its behavior with the Object Constrain Language (OCL) [27]. Then, functional tests are obtained by applying a structural coverage of the OCL code describing the operations of the IoT SUT. CertifyIt tool is used together the Testing and Test Control Notation (TTCN) v.3 language [28] for creating executable tests. In this way, TTCN-3 test cases that are generated from the MBT model can be executed with Titan4, a TTCN-3 toolset that is open source via Eclipse Foundation. TTCN is widely used by the European Telecommunications Standards Institute (ETSI) or the ITU due to its high-level flexibility to be used in different protocols, services and systems. The main goal of the use of TTCN-3 in the proposed methodology is the systematic and automatic testing of security properties in IoT devices for improving efficiency and scalability.

3.6 Test execution

Once suitable tests are generated, they must be adapted [13] in order to be executed on different environments with its own interfaces and behavior. In the context of the project, generated tests can be executed on an in-house or external large-scale testbed. Both testbed approaches aim to serve as platforms to generate benchmark results from the execution of security tests. In particular, during test execution, following tasks can be carried out [30]:

• Measure, in order to collect data from experiments. • Pre-process, for obtaining “clean” data to ease assessment and comparisons among different security technologies and approaches.

• Analyze to get conclusions from benchmarking results as a previous step for the labelling and certification.

• Report, informing the inferred conclusions from experiments results.

As an in-house environment example, PEANA is an experimentation platform based on OMF/SFA. An experimenter intended to use this platform will, in first place, use the web portal to schedule and book its experiment. After booking the required components for his experiment, the platform will allow him to access the Experiment Controller, via SSH. This is the place where the experiment will take place. This way, the experimenter must define his experiment using the OMF Experiment Description Language (OEDL) [29] syntax indicating which are the Resource Controllers to be used, i.e. the entity responsible for managing the different scheduled IoT devices. Such definition can also include the new firmware to be tested, and the gathering of returned information. After executing the experiment, the experimenter will be allowed to analyze the performance of his experiment according to the obtained results.

4 http://www.ttcn-3.org/index.php/tools/tools-noncom/112-non-comm-titan

24

In the case of the large-scale environments, experimentation will be carried out through the FIT IoT-LAB testbed. FIT-IoT has been already improved to give support for experimentation of security and trust, by providing more than 2000 IoT nodes for developing and deploying large-scale IoT experiments. This platform provides a very large scale infrastructure facility suitable for testing small wireless sensor devices and heterogeneous communicating objects5. Test execution is key to get information that can be used during labelling and certification, in order to assess security technologies that are involved in the experiments. In addition, these benchmark results are intended to be used in the FIESTA semantic testbed and are used as a proof to validate the security certification label, which was defined in phase one. 3.7 Labelling

During this last stage of the methodology, benchmarking results from the previous stage are used as an input for labelling and certification purposes. The main purpose of this stage is to check if theoretical results that were expected during initial stages are actually obtained after test executions. It implies the design a labelling scheme specifically tailored to IoT security and trust aspects, so different security technologies and approaches can be compared, in order to certify different security aspects of IoT devices. The establishment of this scheme is key to increase trust of IoT stakeholders for large-scale deployments. However, labelling approaches for IoT need to take into account the dynamism of IoT devices during their lifecycle (e.g. operating on changing environments), which makes security certification a challenging process.

As an initial approach of labelling for IoT security, below we provide a description of the main aspects that are currently being considered. These aspects will be enhanced within next deliverables to build a solid labelling approach for security aspects in IoT environments. For this approach, as already mentioned in previous ARMOUR deliverables, it should be noted that labelling has to take into account the context of the scenario that is being tested and the certification execution. For this reason, and based on CC approach, three mains aspects will be considered to be included in the label:

• TOE (Target of Evaluation): In CC, a TOE is defined as a set of software, firmware and/or hardware possibly accompanied by guidance. In this case, a TOE is defined as a context, for example health, industry or home automation. The TOE includes all the devices of the context. For example, in home automation, the TOE could be the fire sensor, the bulbs, the washing machine, etc.

• Profiles (level of protection): Low, medium and high. The level of protection is related to the threats that can be avoided in the tested scenario.

• Certification execution: The proposed certification execution has 4 levels, which are shown in Figure 8. This aspect is intended to be further extended by the certification approach.

5 https://www.iot-lab.info/

25

Figure 8: Certification execution levels.

In order to be able to label a scenario, we have to consider some metrics that are going to be associated to the execution of the tests, such as:

• Detectability • Impact • Likelihood (difficulty+motivation/benefits) • Difficulty in obtaining a valid message • % of the communication protected with integrity • Recoverability • % of requests/sec necessary • Sensibility of the data • Time • Length of the key • The facility to hack the server

In this way, based on different metrics, we have considered different levels of security associated to a mark for a subset of the security properties that are identified in Figure 6. This association between security properties and marks is shown in Figure 9.

AuthN c/s 0 mutual and strong

1 strong server, weak or without authN client

2 strong client weak or without authN sercer

3 weak/without authN

Resitance to Replay attacks 0 Protected

1 Non protected but a valid message cannot be obtained

26

2 Non protected and a valid message can be obtained with difficulty/weak protection

3 Non protected, it can be obtained easily

Resistance to Diccionary attacks 0 Non applicable

1 Strong key

2 Weak key

Integrity 0 Total

1 Partial

2 None

Resistance to DoS attacks 0 Minimun state

1 Big state

Confidentiality 0 Total with secure encryption

1 Partial with secure encryption

2 Total with insecure encryption

3 Partial with insecure encryption

4 None

Resistance to MITM attacks 0 Detectable

1 Non detectable

Figure 9: Association between security properties and marks based on metrics

As already mentioned, the context must be taken into account in the labelling process. For this reason, we consider an additional parameter: the Risk. This parameter is the product of Likelihood and Impact. Likelihood is related to the probability of happening, taking into account the benefit that a person could obtain hacking it, whereas impact is related to the damage that is produced in the scenario in terms of money, sensitive data filtered, scope, etc. These parameters can be variable for each vulnerability. The levels of each one are shown in the next table.

Likelihood 0 Null benefit

27

1 Medium benefit

2 High benefit

Impact 0 Little damage and recuperable

1 Limited damage (Scope, monetary losses, sensible data...)

2 High damage

Figure 10: Marks for likelihood and impact parameters

Once we have given each thread and scenario a score (risk and mark), we can define the TOE and the profiles, specifying the minimum level of security they must have, the minimum score (in terms of risk and mark) for each property. If the scenario we are testing achieves the level of security of several profiles, it will be labelled with the major level of security. This implies that an upper level satisfies the requirements of the lower levels. Next section describes an example of the methodology application to a concrete experiment.

28

4 Applying ARMOUR benchmarking methodology to concrete experiments

In order to exemplify the application of the proposed methodology, this section is intended to provide the description of the resulting approach to a specific experiment, EXP1, which has been defined in the context of the project. It should be noted that a more detailed description of this experiment is included in D1.2.

4.1 EXP1 Experiment design

The EXP1 is mainly motivated by the need to consider suitable mechanisms for security credential management and distribution, so IoT devices can interoperate securely during their operation. These aspects about key and credential management need to be built on top a secure approach for bootstrapping, since this represent the root of trust of an IoT device’s lifecycle. Specifically, EXP1 is based on the use of CoAP and DTLS protocols, so IoT devices can request security credentials (authorization tokens and group keys in this case), which are used later for a secure operation. Figure 11 provides an overview of the required interaction among the Device, Gateway and the Credential Manager, which is responsible for generating and distributing such credentials. A more detailed description of the experiment can be found in D1.2 [21].

DeviceDevice GatewayGateway Credential ManagerCredential Manager

GroupKeyRequest(credentials)GroupKeyRequest(credentials)

RequestProcessing(credentials)

GroupKeyResponse(group;ey)GroupKeyResponse(groupKey

CoAP+DTLS

Figure 11: Flow of messages for EXP1

4.2 EXP1 Test design

By considering the set of security properties described in Section 3.4, and the vulnerabilities from D1.1, in the context of EXP1, we have defined the association between such properties and vulnerabilities. In particular, Table 1 shows this association. The mapping is one-to-one except for authentication, which is related to vulnerabilities from V1-V5, and V13, which is related to two different security properties, confidentiality and MITM. In fact, we have two test patterns for each one, as we will see in the next table.

29

VULNERABILITY NAME ARMOUR 1.1 MAPPING WITH ARMOUR 1.2

V1 Discovery of Long-Term Service-Layer Keys Stored in M2M Devices or M2M Gateways

Authentication

V2 Deletion of Long-Term Service-Layer Keys stored in M2M Devices or M2M Gateways

Authentication

V3 Replacement of Long-Term Service-Layer Keys stored in M2M Devices or M2M Gateways

Authentication

V4 Discovery of Long-Term Service-Layer Keys stored in M2M Infrastructure

Authentication

V5 Deletion of Long-Term Service-Layer Keys stored in M2M Infrastructure equipment

Authentication

V8 Alteration of M2M Service-Layer Messaging between Entities

Integrity

Authentication

Resistance to MITM attacks

V9 Replay of M2M Service-Layer Messaging between Entities

Resitance to Replay attacks

V13 Eaves Dropping/Man in the Middle Attack Confidentiality

Resistance to MITM attacks

V15 Buffer Overflow Resistance to DoS attacks

Fault tolerance

Availability

V19 Insecure Cryptographic Storage Resistance to Dictionary attacks

Table 1: Association between security properties and vulnerabilities for EXP1

In Table 2, we have the tests patterns of the D2.1 that correspond to the properties tested. We can see here that MITM and confidentiality are tested separately. We also have the related vulnerabilities to each test. In this way, for example, the property authentication is tested by means of test pattern TP_ID1, which is considering the vulnerabilities V1 to V5, as already mentioned. For EXP, we have seven test patterns, one for each security property tested.

30

Test Pattern ID

ARMOUR 2.1

Test Pattern Name

ARMOUR 2.1

Related Vulnerabilities

ARMOUR 1.2

Property tested

EXP1

TP_ID1 Resistance to an unauthorized access, modification or deletion of keys

V1, V2, V3, V4, V5

Authentication

TP_ID4 Resistance to alteration of requests V8 Integrity

TP_ID5 Resistance to replay of requests V9 Resistance to Replay attacks

TP_ID8 Resistance to eaves dropping V13 Confidentiality

TP_ID15 Prevention of dictionary attacks V19 Resistance to Dictionary attacks

TP_ID16 Resistance to man in the middle V13 Resistance to MITM attacks

TP_ID2 Resistance to DoS attacks V15 Resistance to DoS attacks

Table 2: Association test patterns with security properties and vulnerabilities for EXP1

It should be pointed out that the whole set of test patterns for EXP1 is already included in D1.2.

4.3 EXP1 Test generation

According to the proposed methodology, once test patterns are defined for an experiment, MBT is used to generate the model for the SUT. This modelling approach includes the generation of a UML diagram class for the experiments. This way, Figure 12 shows the UML diagram generated from EXP1. Taking into account the set of entities defined in the diagram, the process can be described as follows:

1. The Sensor makes a Request to the Credential Manager in order to obtain a Group_Key.

2. The Credential Manager extracts the set of attributes of the Sensor that is included in the Request.

3. The Credential Manager generates a Group_Key associated to these attributes and sends it to the Sensor in a Response.

4. The Sniffer intercepts the messages exchanged between the Sensor and the Credential Manager and tries to get the Group_Key.

If the sniffer cannot read the group private key, the test will be satisfactory. Otherwise, the test will not be satisfactory.

31

Figure 12: UML diagram for EXP1

Following the proposed methodology, from this model, specific test cases can be generated with TTCN-3 to be executed in Titan to automate the tests execution.

4.4 EXP1 Test execution

The implementation of EXP1 has been carried out in order to allow the experiment can be executed on both in-house and large-scale testbed. An overview of the former is shown in Figure 13.

Figure 13: In-house scenario overview for EXP1

In this case, for the Client device, we have used a microcontroller MSP430F5438A Rev H, with 256 KB of flash memory and 16KB of RAM. This device makes us of the ContikiOS 2.7.

32

As DTLS and CoAP protocols are required for EXP1, in this case we have used Erbium6 and tinyDTLS7 libraries. Furthermore, the use of a border router is required, in order to be able to communicate the client inside ContikiOS with the Credential Manager, which is outside ContikiOS. The complete scenario has been inside the Armour VM, with Ubuntu 32 bits and 2GB of RAM.

In the case of the deployment on FIT IoT, Figure 14 shows an overview of involved components.

Figure 14: Large scale scenario overview

The client and border router are hosted in FIT IoT lab, in a device M3. These devices are based on a STM32 (ARM Cortex M3) micro-controller. They have a 32-bits processing, a new ATMEL radio interface in 2.4 Hz and a set of sensors. The OS embedded within the M3 nodes is ContikiOS.

The M3 node sends and receive data to/from the Credential Manager, which is located in a remote VM. This communication is possible due to the border router. They use a Coap+DTLS protocol with tinyDTLS library for DTLS and Erbium for CoAP. The devices M3 must be reserved on the IoT Lab before the test campaign and the AA must be accessible via SSH.

A more complete description of required steps to run EXP1 in both facilities can be found in D1.2.

4.4.1 Scalability

The scalability of EXP1 has been proved in FIT IoT Lab, by executing the experiment with several devices, which aim to obtain a group key from the Credential Manager (or Attribute Authority in this case) at the same time (Figure 15). The experiments have been replayed by increasing the number of devices up to 9. Execution of the experiment in FIT IoT shows a high level of flexibility to consider large-scale deployments without the need of additional changes or implementation. While this represents a preliminary scalability study about the advantages of using this testbed, a more detailed study about the scalability will be delivered in next deliverables.

6 http://people.inf.ethz.ch/mkovatsc/erbium.php 7 https://projects.eclipse.org/projects/iot.tinydtls

33

Figure 15: Wireshark capture of the EXP1 with several devices

4.5 EXP1 Labelling

In each test, we have to collect the result (if the test has been satisfactory or not). However, other data is also important to decide the level of security. We have to collect information about the time the experiment has taken, the initial parameters it has, regarding length of keys, cryptographic suite used, specific version of the protocols involved, etc.

In some test patterns, such as DoS attack protection, we also have to collect information about the number of attackers and the frequency of the requests. In this section, we are going to see an example of labelling for group sharing with DTLS. For this example, we have considered the metrics and marks that already defined in Section 3.4. The context will be home automation, so the likelihood is set to 1 (medium benefit) and the impact to 1 (limited damage), since we consider that an attacker cannot obtain a high benefit hacking a residential house, and the damage that can be done is limited (a high bill of electricity for example). This is shown in Figure 16

Likelihood 1 Medium benefit

Impact 1 Limited damage (Scope, monetary losses, sensible data...)

Figure 16: Marks for likelihood and impact parameters for EXP1

In theory, and taking into account that we have two parts in the communication (the establishment of the secure communication channel with DTLS and the key/token request), this scenario has the next security properties, painted in green, in Figure 17.

34

EXP1 AuthN c/s mutual and strong strong server, weak or without authN client strong client weak or without authN sercer weak/without authN Resistance to Replay attacks Protected Non protected but a valid message cannot be obtained

Non protected and a valid message can be obtained with difficulty/weak protection

Non protected, it can be obtained easily Resistance to Diccionary attacks Non applicable Strong key Weak key Integrity Total Partial None Resistance to DoS attacks Minimun state Big state Confidentiality Total with secure ciphering Partial with secure ciphering Total with insecure ciphering

Partial with insecure ciphering None Resistance to MITM attacks Detectable Non detectable

Figure 17: Theoretical values security properties and marks based on metrics for EXP1

Below, we can see the result of the tests for EXP1, which is the column labelled with TESTS/Group sharing. In this column we can see in yellow the values that the scenario have for each security property, as we described before. The next columns, are related with the profile the scenario can obtain. For home automation, we have considered three profiles (low, medium and high). The property levels necessary to achieve each profile are shown in green.

In general, if a scenario has a specific level for a property, for example total integrity, it will have the lower levels too (partial and none integrity), so in order to select a profile for the scenario, we select the maximum profile achieved by it. In conclusion, we select the maximum profile satisfied by all the properties. The high profile offers a higher level of security than the medium and lower profile. We can see, for example, that confidentiality is not necessary in the low profile, but it is required in the higher profile.

35

TESTS TOE=HOME AUTOMATION

Group sharing

Low profile

Medium profile

High profile

Group sharing

AuthN c/s mutual and strong x x x ALL

strong server, weak or without authN client

strong client weak or without authN sercer

weak/without authN Resistance to Replay attacks Protected x x x

ALL

Non protected but a valid message cannot be obtained

Non protected and a valid message can be obtained with difficulty/weak protection

Non protected, it can be obtained easily

Resistance to Diccionary attacks Non applicable

ALL

Strong key x x x Weak key Integrity Total MEDIUM

LOW

Partial x x None Resistance to DoS attacks Minimun state ALL

Big state x x x Confidentiality Total with secure ciphering

ALL

Partial with secure ciphering x x x

Total with insecure ciphering

Partial with insecure ciphering

None Resistance to MITM attacks Detectable x x x ALL

Non detectable

OBTAINED PROFILE MEDIUM

36

Analysing the result obtained by EXP1 in each property and comparing it with the required levels of security in each profile, we can see EXP1 satisfies all levels of security for all properties except integrity, since DTLS does not protect all the fields of the messages. As EXP1 has partial integrity protection, it obtains medium and low profile for this property. This is the property with the lowest level reached, so in general, the exp1 can obtain low and medium profile. We select the higher of them. Therefore, EXP1 obtains the medium profile in home automation.

A change in the metrics that we are using (key length, percentage of confidentiality or integrity protection, cryptographic suite, context, etc.) could derive on a change in the label. For example, if we decide not to cipher the communication, the property confidentiality will only satisfy the low profile, that will cause the whole scenario obtains the low profile. On the other hand, if we decide to cipher the whole communication, the scenario will obtain the high profile. However, if we repeat the experiment 1 changing the context home automation by military, the key length used will not be enough, and the property confidentiality will have the value Partial with insecure ciphering, that satisfies medium and low profile. As a result of this, the profile obtained will also be medium despite of this change. For this reason, we will consider a finer grain for the profiles in the future.

Starting from this initial proposal, next deliverables will provide an enriched and standardized approach for a more fine-grained certification and labelling solution for the IoT.

4.5.1 Example of benchmarking: Confidentiality

4.5.1.1 Experiment 1

In order to benchmark the property confidentiality, we execute EXP1 and analyse the trace with the tool Wireshark in FIT IoT Lab with two devices. In this way, we can analyse the content of the packets, what is ciphered or in clear and what information we can obtain from them in order to decide the level of confidentiality the EXP1 has.

In Figure 18, we have the Wireshark trace obtained from the execution of EXP1. A device takes 4.57 seconds in obtaining the group key from the CM. We can see the general flow of DTLS exchange and some packets labelled with “Application data” that contain the finished DTLS message ciphered and the delivery of the group key, also encrypted. All the payloads of these packets of data are completely encrypted, as we can see in Figure 19. However, we can see the version of DTLS, which can be a source of information for a hacker if it is a weak version, a version with some bugs that can be exploited. We also can see epoch and sequence number, which can be exploited for a replay attack more elaborated than a capture and send without modification.

37

Figure 18: Wireshark capture of exp1. Device 1.

Figure 19: Content of an application data message.

The rest of the packets of the DTLS communication are in clear. In Figure 20 we can see the session ID, the cookie and the random number, that could also be exploited in an elaborated replay attack. It is worth noting that we can also see what cipher suite and key length is going to be used. This is a good source of information for hackers, which can use it to decide if the key length is weak or if the cipher suite used has security bugs.

38

Figure 20: Client Hello message content.

In this case, DTLS use AES with a 128 bits key length, which is considered secure enough by NIST (Figure 21). We consider 128 and 192 bits a medium security, and 256 and more, a high security, due to the difficulty of breaking the key. A key of 256 bits is only used for very sensitive information, such as military or government documents. In the scenario proposed for the context of home automation, this level of security is enough.

Figure 21: NIST key length recommendations.

In conclusion, we have that exp1 uses a strong cryptographic suite, but messages are partially protected, since we can see the content of all of them from the client hello until the finish message. This is the reason why we have labelled confidentiality in exp1 as “Partial with secure ciphering”. Next document will provide a more detailed description of the benchmarking related to other security properties.

39

5 Conclusions Nowadays, the IoT ecosystem demands for large-scale deployments, where devices can provide a high level of security, in order to cover typical threats and vulnerabilities. One of the main ambitions of the project stems from the specification of a certification approach, to give experimenters the ability to assess and compare different IoT security technologies. This certification scheme should be widely accepted in order to harmonize the vision of IoT security aspects, independently from the country or the business area. Towards this end, there is a real need to consider a systematic and automated methodology that enables scalable testing approaches for security aspects in IoT.

This document aimed to provide an initial description of the proposed methodology for this purpose. Specifically the proposed benchmarking methodology is intended to automate the generation of security tests, as well as the extraction of benchmarking results from these tests. Such results represent the baseline to compare different security technologies and, consequently, the key for the labelling and certification approach. This methodology is built on top of different approaches and technologies, which represent the realization of ARMOUR concepts and the instantiation of such methodology.

40

References [1] Jing, Q., Vasilakos, A. V., Wan, J., Lu, J., & Qiu, D. (2014). Security of the Internet of

Things: perspectives and challenges. Wireless Networks, 20(8), 2481-2501. [2] Anderson, R., & Fuloria, S. (2009, September). Certification and evaluation: A security

economics perspective. In Emerging Technologies & Factory Automation, 2009. ETFA 2009. IEEE Conference on (pp. 1-7). IEEE.

[3] Kruger, C. P., & Hancke, G. P. (2014, July). Benchmarking Internet of things devices. In Industrial Informatics (INDIN), 2014 12th IEEE International Conference on (pp. 611-616). IEEE.

[4] Certification de Sécurité de Premier Niveau (CSPN). https://www.ssi.gouv.fr/administration/produits-certifies/cspn/

[5] Antoine Coutant. French Scheme CSPN to CC evaluation http://www.yourcreativesolutions.nl/ICCC13/p/CC%20and%20New%20Techniques/Antoine%20COUTANT%20-%20CSPN%20to%20CC%20Evaluation.pdf. Last accessed 22/01/2017.

[6] UL 2900 [https://standardscatalog.ul.com/standards/en/outline_2900-1_2] [7] Commercial Product Assurance (CPA) [https://www.ncsc.gov.uk/scheme/commercial-

product-assurance-cpa] [8] CSA STAR [https://cloudsecurityalliance.org/star/] [9] The Common criteria. “Collaborative Protection Profiles (cPP) and Supporting

Documents (SD)”, 2015 [https://www.commoncriteriaportal.org/pps/static.htm] [10] Anderson, R., & Fuloria, S. (2009, September). Certification and evaluation: A security

economics perspective. In Emerging Technologies & Factory Automation, 2009. ETFA 2009. IEEE Conference on (pp. 1-7). IEEE.

[11] Under Attack: Common Criteria has loads of critics, but is it getting a bum rap Government Computer News, https://gcn.com/articles/2007/08/10/under-attack.aspx

[12] Heer, T., Garcia-Morchon, O., Hummen, R., Keoh, S. L., Kumar, S. S., & Wehrle, K. (2011). Security Challenges in the IP-based Internet of Things. Wireless Personal Communications, 61(3), 527-542.

[13] Baldini, G., Skarmeta, A., Fourneret, E., Neisse, R., Legeard, B., Le Gall, Franck. Security certification and labelling in Internet of Things. 2016 IEEE 3rd World Forum on Internet of Things (WF-IoT)

[14] Sunyé, G., De Almeida, E. C., Le Traon, Y., Baudry, B., & Jézéquel, J. M. (2014). Model-based testing of global properties on large-scale distributed systems. Information and Software Technology, 56(7), 749-762.

[15] Grabowski, J., Hogrefe, D., Réthy, G., Schieferdecker, I., Wiles, A., & Willcock, C. (2003). An introduction to the testing and test control notation (TTCN-3). Computer Networks, 42(3), 375-403.

[16] PROBE-IT Deliverable. D2.1 Benchmarking framework for IoT deployment evaluation. http://www.probe-it.eu/wp-content/uploads/2012/10/D2-1-Benchmarking-framework-v4.pdf

[17] Rescorla, E., & Modadugu, N. (2012). RFC 6347, Datagram Transport Layer Security Version 1.2. Internet Engineering Task Force.

[18] Shelby, Z., Hartke, K., & Bormann, C. (2014). RFC 7252—The Constrained Application Protocol (CoAP). Internet Engineering T ask Force (IETF).

[19] ARMOUR Deliverable. D1.1 – ARMOUR Experiments and Requirements. http://armour-project.eu/wp-content/uploads/2016/08/D11-ARMOUR-Experiments-and-Requirements.pdf

41

[20] oneM2M-TR-0008 (2013). Analysis of Security Solutions for oneM2M System v0.2.1. [21] ARMOUR Deliverable. D2.1 – Generic test patterns and test models for IoT Security

Testing. http://armour-project.eu/wp-content/uploads/2016/08/D21-Generic-test-patterns-and-test-models-for-IoT-security-testing.pdf

[22] Potter, B., & McGraw, G. (2004). Software security testing. IEEE Security & Privacy, 2(5), 81-85.

[23] Felderer, M., Büchler, M., Johns, M., Brucker, A. D., Breu, R., & Pretschner, A. (2016). Security Testing: A Survey. Advances in Computers, 101, 1-51.

[24] Utting, M., & Legeard, B. (2010). Practical model-based testing: a tools approach. Morgan Kaufmann.

[25] Legeard, B., & Bouzy, A. (2013, March). Smartesting certifyit: Model-based testing for enterprise it. In Software Testing, Verification and Validation (ICST), 2013 IEEE Sixth International Conference on (pp. 391-397). IEEE.

[26] Botella, J., Bouquet, F., Capuron, J. F., Lebeau, F., Legeard, B., & Schadle, F. (2013, March). Model-Based Testing of Cryptographic Components--Lessons Learned from Experience. In Software Testing, Verification and Validation (ICST), 2013 IEEE Sixth International Conference on (pp. 192-201). IEEE.

[27] Warmer, J. B., & Kleppe, A. G. (1998). The object constraint language: Precise modeling with uml (addison-wesley object technology series).

[28] Willcock, C., Deiß, T., Tobies, S., Engler, F., & Schulz, S. (2011). An introduction to TTCN-3. John Wiley & Sons.

[29] Rakotoarivelo, T., Ott, M., Jourjon, G., & Seskar, I. (2010). OMF: a control and management framework for networking testbeds. ACM SIGOPS Operating Systems Review, 43(4), 54-59.

[30] Pérez, S., Martínez, J.A., Skarmeta, A., Mateus, M., Almeida, B., Maló, P. ARMOUR: Large-Scale Experiments for IoT Security & Trust. 2016 IEEE 3rd World Forum on Internet of Things (WF-IoT)

[31] Selander, G., Palombini, F. and Hartke, K. (2016). Requirements for CoAP End-To-End Security.

[32] Moore, K., Barnes, R. and Tschofenig, H. (2016). Best Current Practices for Securing Internet of Things (IoT) Devices.

[33] Abomhara, M. and Køien, G.M. (2015). Cyber Security and the Internet of Things: Vulnerabilities, Threats, Intruders and Attacks. University of Agder, Norway. Journal of Cyber Security Vol. 4, (pp. 65–88).

[34] Heer, T., Garcia-Morchon, O., Hummen, R., Loong, S., Kumar, S. and Wehrle, K. (2011) Security Challenges in the IP-based Internet of Things. Wireless Pers Commun, Vol. 61 (pp. 527–542).

[35] Suo, H., Wan, J., Zou, C. and Liu, J. (2012). Security in the Internet of Things: A Review. International Conference on Computer Science and Electronics Engineering.

ARMOUR Deliverable D4.1-FINALV3 - copie · 3! RevisionHistory! Revision! Date! Description!...

Documents

Transcript of ARMOUR Deliverable D4.1-FINALV3 - copie · 3! RevisionHistory! Revision! Date! Description!...