Applying machine learning and data analytics to optimize ... · Applying machine learning and data...

30
Applying machine learning and data analytics to optimize data security and reliability for M2M and the Internet of Things Mehran Roshandel, Deutsche Telekom AG, Telekom Innovation Laboratories (T-Labs) Telecoms Fraud and Risk Management, 26th - 28th November 2012, Copthorne Tara Hotel, London

Transcript of Applying machine learning and data analytics to optimize ... · Applying machine learning and data...

Page 1: Applying machine learning and data analytics to optimize ... · Applying machine learning and data analytics to optimize data security and reliability for M2M and the Internet of

Applying machine learning and data analytics to optimize data security and reliability for M2M and the Internet of Things Mehran Roshandel, Deutsche Telekom AG, Telekom Innovation Laboratories (T-Labs)

Telecoms Fraud and Risk Management, 26th - 28th November 2012, Copthorne Tara Hotel, London

Page 2: Applying machine learning and data analytics to optimize ... · Applying machine learning and data analytics to optimize data security and reliability for M2M and the Internet of

Telekom Innovation Laboratories 2

M2M related activities at Telekom Innovation Laboratories.

Secure Micro Kernel Secure mobile

middleware World of connected

objects M2M business

enabling demonstrator

Internet of Services/ Internet of

Things

Prototype of home Mgmt. infrastructure

Home Mgmt. Platform Demonstrator

Telco Enabling for the Cloud

Mobile Wallet

Page 3: Applying machine learning and data analytics to optimize ... · Applying machine learning and data analytics to optimize data security and reliability for M2M and the Internet of

Telekom Innovation Laboratories 3 3

Agenda.

Introduction

M2M Key risk indicators

Using machine learning within Deutsche Telekom AG

Forecasting the incidents in IT server farms and it’s financial impact

Assessing risk of financial transactions and it’s financial impact

Summary

Page 4: Applying machine learning and data analytics to optimize ... · Applying machine learning and data analytics to optimize data security and reliability for M2M and the Internet of

Telekom Innovation Laboratories 4

Introduction.

Page 5: Applying machine learning and data analytics to optimize ... · Applying machine learning and data analytics to optimize data security and reliability for M2M and the Internet of

Telekom Innovation Laboratories 5

Time

… every “box” will also be a computer …

Smart & Connected Devices

Security ???

Page 6: Applying machine learning and data analytics to optimize ... · Applying machine learning and data analytics to optimize data security and reliability for M2M and the Internet of

Telekom Innovation Laboratories 6

M2M risk indicators

1. Crime statistics

2. M2M relevant attacks in the past

3. Recent “Hacker Conference” Contributions

Page 7: Applying machine learning and data analytics to optimize ... · Applying machine learning and data analytics to optimize data security and reliability for M2M and the Internet of

Telekom Innovation Laboratories 7

Selected Attacks relevant for M2M communication security between 1997 and 2011

„Criminal Electronic Trespassing“ in Germany

Case Statistics Paragraph 202a StGB

Crime Statistics

M2M relevant Attacks

Hacker Conference

Contributions

M2M risk indicators …

Page 8: Applying machine learning and data analytics to optimize ... · Applying machine learning and data analytics to optimize data security and reliability for M2M and the Internet of

Telekom Innovation Laboratories 8

Cases falling under § 202a StGB in Germany

Source: Bundeskriminalamt, Germany

1

10

100

1000

10000

100000

1990 1995 2000 2005 2010

Year

Cas

es

„Criminal Electronic Trespassing“ statistics in Germany A „Moore„s law of hacking“?

Page 9: Applying machine learning and data analytics to optimize ... · Applying machine learning and data analytics to optimize data security and reliability for M2M and the Internet of

Telekom Innovation Laboratories 9

Chaos Communication Camp 2011 - hunz

Machine-to-machine (M2M) security

…Smart Meter Hack provides SSL access to backend systems …

Chaos Communication Camp 2011- hunz

Machine-to-machine (M2M) security

… access vendor network & other cars using a hacked GSM motor module …

Chaos Communication Camp 2011- Karsten Nohl & Luca Melette

GPRS Intercept - Wardriving phone networks

…Permeating & eavesdroppig GPRS data networks…

https://events.ccc.de/camp/2011/

Page 10: Applying machine learning and data analytics to optimize ... · Applying machine learning and data analytics to optimize data security and reliability for M2M and the Internet of

Telekom Innovation Laboratories 10

M2M DDOS attack Many devices without Anti-Virus and firewall protection

Page 11: Applying machine learning and data analytics to optimize ... · Applying machine learning and data analytics to optimize data security and reliability for M2M and the Internet of

Telekom Innovation Laboratories

Detective

Preventive

Protective

11 11

What is the best way to face to security issues?

Access control,

Firewall,

Encryption

Secure design,

Redundancy,

Scalable dimensioning

Malware detection,

Deep packet inspection,

Incident alarming

Page 12: Applying machine learning and data analytics to optimize ... · Applying machine learning and data analytics to optimize data security and reliability for M2M and the Internet of

Telekom Innovation Laboratories 12

Example for security by detection. Platform & systems for service monitoring.

There are many traditional monitoring systems for detection of incidents,

misuses and frauds

Processing huge amount of data

Rule and threshold based

Good for known cases of incidents or misuse

Often reactive

Complicated correlations and relationships between input variables can not be identified by humans

Not adaptive, if scenario changes

Attackers can adapt their strategies

Traditional monitoring systems

Page 13: Applying machine learning and data analytics to optimize ... · Applying machine learning and data analytics to optimize data security and reliability for M2M and the Internet of

Telekom Innovation Laboratories 13

Is it sufficient to have only a simple detection of incidents?

Forecast is useful to avoid

incidents or major damages. http://www.livescience.com/24380-hurricane-sandy-status-data.html

http://www.bz-berlin.de/aktuell/deutschland/wetter-wolken-und-regen-nach-pfingsten-article1202936.html

Page 14: Applying machine learning and data analytics to optimize ... · Applying machine learning and data analytics to optimize data security and reliability for M2M and the Internet of

Telekom Innovation Laboratories 14

Why do we need machine learning?

http://www.livescience.com/24380-hurricane-sandy-status-data.html

http://www.bz-berlin.de/aktuell/deutschland/wetter-wolken-und-regen-nach-pfingsten-article1202936.html

Innovative pattern based monitoring

system

Processing of available data in order to learn from the history and prediction of the future

No human-created rules are needed

There are no threshold to bypass

Good for known cases of incidents or misuse but also for unknown threats

Proactive and preventive and not only reactive

The application of machine learning based systems can help where classical methods fail or have their limits.

Detection and prediction of unknown threats

Page 15: Applying machine learning and data analytics to optimize ... · Applying machine learning and data analytics to optimize data security and reliability for M2M and the Internet of

Telekom Innovation Laboratories 15

ML algorithms need data for learning and classification. Is data privacy an Issue?

http://www.livescience.com/24380-hurricane-sandy-status-data.html

http://www.bz-berlin.de/aktuell/deutschland/wetter-wolken-und-regen-nach-pfingsten-article1202936.html

Deutsche Telekom pays special

attention to data privacy

The data privacy guidelines of Deutsche Telekom is more restrictive as required by law.

Machine learning algorithms are focusing only on learning patterns, which will be used for classification of behavior.

Personal data or business critical data will require anonymization or pseudonymization before processing. This applies to all examples presented in this talk.

Germany has one of the best (strongest) data privacy rules.

Page 16: Applying machine learning and data analytics to optimize ... · Applying machine learning and data analytics to optimize data security and reliability for M2M and the Internet of

Telekom Innovation Laboratories 16

Using machine learning within Deutsche Telekom AG

Forecasting the incidents in IT server farms: Project: Anomaly detection in IT server farms (ADIT)

Page 17: Applying machine learning and data analytics to optimize ... · Applying machine learning and data analytics to optimize data security and reliability for M2M and the Internet of

Telekom Innovation Laboratories 17

Forecasting the incidents in IT server farms. Challenges for IT service management.

http://www.livescience.com/24380-hurricane-sandy-status-data.html

http://www.bz-berlin.de/aktuell/deutschland/wetter-wolken-und-regen-nach-pfingsten-article1202936.html

Goals:

Ensure availability and comply with SLAs.

Typical types of failure:

Software bugs, Hardware failure, Misconfiguration, Network failure, Overloading, Attacks

Challenges:

Early detection of incidents

Root cause analyses

Fast system recovery

Denial of service can be very expensive

Page 18: Applying machine learning and data analytics to optimize ... · Applying machine learning and data analytics to optimize data security and reliability for M2M and the Internet of

Telekom Innovation Laboratories 18

Forecasting the incidents in IT server farms. Challenges for IT service management.

http://www.livescience.com/24380-hurricane-sandy-status-data.html

http://www.bz-berlin.de/aktuell/deutschland/wetter-wolken-und-regen-nach-pfingsten-article1202936.html

Requirements for monitoring systems: Detection of disturbance or anomalies of

running servers in data centers at an early stage

Early detection: Minimizing downtimes due to incidents

Flexibility in the analysis of new types of measurement data

Prediction of upcoming incident situations and detection of unknown types of incidents

Without specific knowledge about service application

Solution: Machine learning algorithms have proved to fulfill all these requirements.

Failure Prediction in IT systems

Page 19: Applying machine learning and data analytics to optimize ... · Applying machine learning and data analytics to optimize data security and reliability for M2M and the Internet of

Telekom Innovation Laboratories

ADIT is a machine learning based monitoring system.

ADIT has a extendable input interface for receiving any kind of data.

ADIT brings the received data in a standard format.

Detectors use the available data (9 week from the past) for learning and building a pattern based model.

After the learning process are detectors able to recognize anomal behaviors.

Detectors are implemented based on different algorithms and specific data.

Detected point anomalies will be consolidated to events, which can be send to other systems or administrators.

19

Forecasting the incidents in IT server farms. Short description of ADIT.

http://www.livescience.com/24380-hurricane-sandy-status-data.html

Failure Prediction in IT systems

Information acquisition

Business data

Log files

Audit trails

IT measurements

Information processing

Information exploitation

Rec

omm

enda

tion/

C

olla

bora

tion

Knowledge repository

Privacy preserving information acquisition, processing, and exploitation.

Anomaly detector

plugins

Page 20: Applying machine learning and data analytics to optimize ... · Applying machine learning and data analytics to optimize data security and reliability for M2M and the Internet of

Telekom Innovation Laboratories 20

Anomalies in IT servers. Example: Early detection of a real incident. Real incident (INC-2012-CW05-1) was early (>7 hours before) detected via feature “CPUUser”.

Detection by ADIT: 0:05

Incident start time: 7:00 Incident solved: 1:00 (next day)

Incident recognized: by

traditional system 7:53

Page 21: Applying machine learning and data analytics to optimize ... · Applying machine learning and data analytics to optimize data security and reliability for M2M and the Internet of

Telekom Innovation Laboratories 21

Anomalies in IT servers: Summary of results.

Detection times of ADIT for the 7 incidents (True Positive, correctly detected), relative time scale

1: INC-2012-CW03-1

3: INC-2012-CW05-1

4: INC-2012-CW06-1

6: INC-2012-CW06-3

8: INC-2012-CW07-1

9: INC-2012-CW07-2

11: INC-2012-CW07-3

Detection by ADIT (T=0h)

0h 1h 2h 3h 4h 5h 6h 7h 8h

Official Incident Start

Incident reported by current system

Relative Time

Incident Number

Delta time: -

4:02

7:48

1:07

4:57

0:12

1:14

3:44

Sum: 22:04

Avg: 03:08

Page 22: Applying machine learning and data analytics to optimize ... · Applying machine learning and data analytics to optimize data security and reliability for M2M and the Internet of

Telekom Innovation Laboratories 22

Forecasting the incidents in IT server farms. Conclusion.

http://www.livescience.com/24380-hurricane-sandy-status-data.html

http://www.bz-berlin.de/aktuell/deutschland/wetter-wolken-und-regen-nach-pfingsten-article1202936.html

Machine learning enables Effective and efficient detection of

abnormal behavior.

Early prediction

Decrease of downtime and financial loses.

High-quality predictions regardless of the type of server (web, database, application) or data.

In the T-Labs project ADIT we could prove the benefit of Machine learning by monitoring of IT servers.

Results of ADIT

ADIT

Requirements

Benefits

Strategy

Page 23: Applying machine learning and data analytics to optimize ... · Applying machine learning and data analytics to optimize data security and reliability for M2M and the Internet of

Telekom Innovation Laboratories 23

Using machine learning within Deutsche Telekom AG

Assessing risk of financial transactions and it‟s financial impact

Page 24: Applying machine learning and data analytics to optimize ... · Applying machine learning and data analytics to optimize data security and reliability for M2M and the Internet of

Telekom Innovation Laboratories 24

Using machine learning in financial domain. Misuse/Fraud detection in financial transactions.

http://www.livescience.com/24380-hurricane-sandy-status-data.html

http://www.bz-berlin.de/aktuell/deutschland/wetter-wolken-und-regen-nach-pfingsten-article1202936.html

Financial reporting

Moving money (Automated Clearing Bureau)

Invoice Creation and Approval

Product Administration Systems

Insurance transactions

Payment transactions

Example of financial systems

Page 25: Applying machine learning and data analytics to optimize ... · Applying machine learning and data analytics to optimize data security and reliability for M2M and the Internet of

Telekom Innovation Laboratories 25

Using machine learning in financial domain. Misuse/Fraud detection in financial transactions.

http://www.livescience.com/24380-hurricane-sandy-status-data.html

http://www.bz-berlin.de/aktuell/deutschland/wetter-wolken-und-regen-nach-pfingsten-article1202936.html

Destination bank account manipulation

Payment authorization inconsistencies

Approval process inconsistencies

Delegation of authority lapses and abuses

Fraudulent transaction patterns

Insurance fraud

Payment anomalies (Avoidance of potential loss of money)

Types of attacks or anomalies

Page 26: Applying machine learning and data analytics to optimize ... · Applying machine learning and data analytics to optimize data security and reliability for M2M and the Internet of

Telekom Innovation Laboratories

The blue curve shows the historical trend of potential losses in 2011 without advanced risk management

The red curve shows the estimated potential losses with machine learning based risk management.

Financial savings in 2011

Business impact of machine learning based system on financial transaction system

26

Page 27: Applying machine learning and data analytics to optimize ... · Applying machine learning and data analytics to optimize data security and reliability for M2M and the Internet of

Telekom Innovation Laboratories

Results of machine learning based risk management. Accuracy of 84% as shown in proof-of-concept.

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

WTP

R

FPR

Weighted ROC (transactions 10Euros and up)

ROC: Receiver Operating Characteristic

WTPR: Weighted True Positive Rate

FPR: False Positive Rate

WTPR = 84% at FPR = 10%

Evaluation: Over 85 weeks, customer subset

27

Page 28: Applying machine learning and data analytics to optimize ... · Applying machine learning and data analytics to optimize data security and reliability for M2M and the Internet of

Telekom Innovation Laboratories 28

Summary.

Avoiding of Risk

Monitoring

Machine

Learning

M2M

Page 29: Applying machine learning and data analytics to optimize ... · Applying machine learning and data analytics to optimize data security and reliability for M2M and the Internet of

Telekom Innovation Laboratories

Growing of M2M technologies increase the need of security and risk management.

Since many new types of devices are involved in M2M communication, there arise new complex forms of M2M relationship and risk scenarios which can not be completely described only by rule based systems.

The application of machine learning based systems can help where classical (security and cryptographic) methods fail or have their limits.

The introduced examples have shown that machine learning algorithms provide detection mechanisms that discovers anomalies and/or risks that can only be covered by these new technology.

Due to the consequent use of machine learning based risk detection financial losses are preventable beyond the opportunities of traditional detection solutions.

Machine Learning helps to break the limits of traditional monitoring & risk management systems

Benefit of machine learning system for M2M.

29

Page 30: Applying machine learning and data analytics to optimize ... · Applying machine learning and data analytics to optimize data security and reliability for M2M and the Internet of

Telekom Innovation Laboratories 30 30

Thank you for your attention!