Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland,...

39
Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z http://ter.ps/ 759d https://www.facebook.com/SDSAtUMD

Transcript of Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland,...

Page 1: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

Security Data Science (SDS)

Prof. Tudor DumitrașAssistant Professor, ECEUniversity of Maryland, College Park

ENEE 759D | ENEE 459D | CMSC 858Z

http://ter.ps/759d

https://www.facebook.com/SDSAtUMD

Page 2: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

Introducing Your Instructor

2

Tudor DumitrașOffice: AVW 3425Email: [email protected] Website: http://ter.ps/759d Office Hours: Mon 2-3 pm

Page 3: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

My Background• Ph.D. at Carnegie Mellon University

– Research in distributed systems and fault-tolerant middleware

• Worked at Symantec Research Labs– Built WINE platform for Big Data experiments in security

– WINE currently used by academic researchers and Symantec engineers

• Joined UMD faculty

• Research and teaching on applied security and systems– Focus on solving security problems with data analysis techniques

3

WINE

Page 4: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

SDS In A Nutshell• Course objectives

– Ability to understand and interpret scholarly publications, to explain their key ideas, and to provide constructive feedback

– Ability to apply some of these ideas in practice

• Topics

• Grading– 50% paper reviews and class participation

– 50% projects

Vulnerabilities and exploits Spam infrastructuresFailures of cryptosystems Pay per installInternet worms Attacks against physical infrastructureDenial of service Targeted attacksBotnets Economic implications of cybercrime

4

Page 5: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

We Are Swimming in Data• Data created/reproduced in 2010: 1,200 exabytes• Data collected to find the Higgs boson: 1 gigabyte / s• Yahoo: 200 petabytes across 20 clusters

• Security: – Global spam in 2011: 62 billion / day

– Malware variants created in 2011: 403 million

5

Page 6: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

Why So Much Data?• We can store it

– 6¢ / GB

– 29¢ / GB (SAS HDD)

• We can generate it– Most data is machine-generated

– Most malware samples are variants of other malware, generated automatically (repacking, obfuscation)

What to do with all this data? 6

Page 7: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

Three Stories about Data

7

Page 8: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

WHAT QUESTIONS TO ASK ON A FIRST DATE?The Power of Big Data

ONE

8

Page 9: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

If You Want to Know …Do my date and I have long-term potential?

9

Page 10: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

If You Want to Know …Do my date and I have long-term potential?

Q Do you like horror movies?

Q Have you ever traveled around another country alone?

Q Wouldn't it be fun to chuck it all and go live on a sailboat?

Likelihood ofcoincidence

275,000 user submitted questions34,260 real world couples

3.7×

10

DataPsychology

… ask:

Top 3 user rated questions, about:• God• Sex • Smoking

Page 11: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

11

Source: CNN Money

• eHarmony– Analyzes hundreds of behavioral variables, most collected automatically

– CTO: former search engineer at Yahoo!

• OkCupid We do math to get you dates

– Founded by Harvardmath & CS majors

• PlentyOfFishBuilding this matching system was harder than [being] cited in the paper that won the Fields Medal

Online Dating and Big Data

Page 12: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

Early 1900s: Most Factories Had Private Generators

12

Source: Nicholas Carr

Electricity was critical for business, but not widely available

Page 13: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

13

Source: OkCupid

Is he an engineer?

Does she dateengineers?

Data analytics provide remarkable insight

Applications in many disciplines

Page 14: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

What Is Data Science?

• Also known as ……Big Data analytics

…Machine intelligence

…Data-intensive computing

…Data wrangling

…Data munging

…Data jujitsu

14

Source: Drew Conway

Page 15: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

TWOIMPROVING MACHINE TRANSLATIONThe Unreasonable Effectiveness of Data

15

Page 16: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

2005 NIST Machine Translation Competition

• Google’s first entry– None of the engineers spoke Arabic

• Simple statistical approach

• Trained using United Nations documents– 200 million translated words

– 1 trillion monolingual words

16

English-Arabic competition

Page 17: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

17

For many hard problems there appears to be a threshold of sufficient data A. Halevy, et al., CACM 2009.

Page 18: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

What is Security Data Science?

• Also known as …… Security analytics

… Surveillance analytics

• Applying data science methods to security problems

18

Page 19: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

Security Principles in 60 Seconds [J. Saltzer & M. Schroeder, SOSP 1973]

• Economy of mechanism: Keep the protection mechanism as simple and small as possible

• Fail-safe defaults: Base access decisions on permission rather than exclusion

• Complete mediation: Check every access to every object• Open design: Do not keep the design secret• Separation of privilege: Require two keys to unlock, not one• Least privilege: Grant every program/user the least set of

privileges necessary to complete the job• Least common mechanism: Minimize the amount of mechanism

common to more than one user and depended on by all users• Psychological acceptability: Design interfaces for ease of use

19

Page 20: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

Security in Practice(Source: C. Nachenberg, Symantec)

• 1986: Simple computer viruses– Defense: anti-virus

• 1990: Polymorphic viruses (decryption logic + encrypted malicious code)

– Defense: “universal” decoder, emulation

• 1995: Macro viruses– Defense: AV vendor cooperation, digital signatures for macros

• 1999: Worms– Defense: Vulnerability-specific signatures

• 2004: Web-based malware– Defense: behavior blocking

• 2006: Auto-generated malware – Defense: reputation based security

• 2010 (but probably earlier): Targeted attacks (physical infrastructure, 0-day, etc.)

– Defense: ??20

Page 21: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

THREE

21

UNDERSTANDING ZERO-DAY ATTACKSThe Need for Security Data Science

Page 22: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

Zero-Day Attacks: Recent Examples

22

2009: Operation Auroraagainst Google

2010: Stuxnet

2011: Attack against RSA

Zero-day attack = cyber attack exploiting a software vulnerability before the public disclosure of the vulnerability

Page 23: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

Price of Zero-Day Exploits on the Black Market

23

The Economist, March 2013

Page 24: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

The Elderwood Project

24

Group with “seemingly unlimited” supply of zero-day exploits(Source: Symantec)

Page 25: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

Zero-Day Attacks: Open Questions

Decade-long open questions• How common are zero-day attacks?• How long can they remain undiscovered?• What happens after disclosure?

Creation

Vulnerabilitytimeline

[Arbaugh 2000, Frei 2008, McQueen 2009, Shahzad 2012]

Prior work

Zero-day attack

Vulnerability disclosed(“day zero”)

Exploit used in attacks

Security patch released

All hosts patched

25

Page 26: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

Zero-Day Attacks: Open Questions (cont’d)

26

Creation Vulnerability disclosed(“day zero”)

Exploit used in attacks

Security patch released

All hosts patched

Decade-long questions: Why still open?• Rare events, hard to observe in small data sets• Need data analysis at scale

[weeks]

Before disclosure:Targeted attacks

After disclosure:Large-scale attacks

Rare events

Page 27: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

Research in Security Data Science

27

Challenge 1: Find the needle in the haystack– Example: Identify and measure zero-day attacks

Challenge 2: Ensure generally applicable and repeatable results – The threat landscape changes frequently

Challenge 3: Deal with new and advanced threats– Skilled and persistent hackers can bypass firewalls, anti-virus, password-

protected systems, two-factor authentication, physical isolation

[…]

-100 -50 T0 50 100 150 (weeks)

Varia

nts

10

103

105

403 million new malware variants created in 2011

Targeted attacks before disclosure

Rare events

Your thesis topic goes here

Page 28: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

What is Security Data Science? (re-visited)• Systems knowledge: develop technologies needed to store and

process massive data sets• Statistics & machine learning knowledge: analyze the data and

extract information• Security knowledge: ask the right questions about cyber attacks

• Data scientists are in high demand in the cybersecurity industry

Booz Allen may be recruiting more [data scientists] than Google or Facebook

The Economist, June 2013

28

Page 29: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

Course Content• Introduction to Security Data Science

• Hands-on emphasis – this is largely an unexplored research area– Team-based projects

– Reviews of scholarly publications

– No textbook

• Specific things you can expect to learn– Selected topics in security

– System skills: Experiment design, data analysis, scalability

– Team skills: Cooperating to achieve your team goals

– Speaking/writing skills: Presenting paper/project findings, providing constructive feedback

29

Page 30: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

This is an Advanced Course• You are responsible for holding up your end of the educational

bargain– I expect you to attend classes and to complete reading assignments

– I expect you to learn how to analyze data and to try things out for yourself

– I expect you to know how to find research literature on security topics• The required readings provide starting points

– I expect you to manage your time• In general there will be one written assignment due before each lecture

• Learning material in this course requires participation – This is not a sit-back-and-listen kind of course; class participation is required

for understanding the material and makes up a part of your grade!

• Different grading criteria for graduate and undergraduate students

Page 31: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

Reading Assignments• Readings: 1-2 papers before each lecture

– Not light reading – some papers require several readings to understand

– For next time: C. Kanich et al., 'Spamalytics: An Empirical Analysis of Spam Marketing Conversion,'ACM CCS, 2008.

– Check course web page (still in flux) for next readings and links to papers

• Homeworks: review the papers you read using a defined template– Submit homework by email to [email protected]

• We might switch to a Web based submission system in the future

– Due at 6 pm the evening before class

– BibTeX template: Summary, Contributions, Weaknesses, Opinion (optional)

– I will provide feedback on some of your written critiques; no email means your writeup is satisfactory

• In-class discussion: stand up and talk about the papers– Volunteers are preferred

– Students randomly selected if no volunteers

31

Page 32: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

Discuss …Do my date and I have long-term potential?

Q Do you like horror movies?

Q Have you ever traveled around another country alone?

Q Wouldn't it be fun to chuck it all and go live on a sailboat?

Likelihood ofcoincidence

275,000 user submitted questions34,260 real world couples

3.7×

32

DataPsychology

… ask:

Top 3 user rated questions, about:• God• Sex • Smoking

Page 33: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

Course Projects• Pilot project: two-week individual projects

– Propose a security problem and a data set that you could analyze to solve it• Some ideas are available on the web page

– Conduct preliminary data analysis and write a report

– Propose projects by September 9th (soft deadline)

– Submit report by September 18th

• Group project: ten-week group project– Deeper investigation of promising approaches

– Submit written report and present findings during last week of class• 2 checkpoints along the way (schedule on the course web page)

– Form teams and propose projects by September 30 th

• Peer reviews: review at least 2 project reports from other students– Use skills learned from paper reviews

– Post project proposals, reports and reviews on Piazza

33

Page 34: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

Pre-Requisite Knowledge• Good programming skills

– Knowledge of languages commonly used in data analysis, like Matlab or R, is a plus

– To brush up: ‘Data Analysis and Visualization with MATLAB for Beginners’ seminar, on September 12 at 5pm, Room 1110 Kim Engineering Building

• Ability to come up to speed on advanced security topics– Covered in the paper readings

– Basic knowledge of security (CMSC 414, ENEE 459C or equivalent) is a plus

• Ability to come up to speed on data analytics– Lectures provide light-duty tutorials, but you will need to pick up the

details as you go along 34

Page 35: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

Policies• “Showing up is 80% of life” – Woody Allen

– Participation in in-class discussions is required for full credit

– You can get an “A” with a few missed assignments, but reserve these for emergencies (conference trips, waking up sick, etc.)

– Notify the instructor if you need to miss a class, and submit your homework on time

• UMD’s Code of Academic Integrity applies, modified as follows:– Complete your homework entirely on your own. After you hand in your

homework, you are welcome (and encouraged) to discuss it with others

– Discuss the problems and concepts involved in the project, but produce your own project implementation, report and presentation• Group projects are the result of team work

• See class web site for the official version 35

Page 36: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

Classroom Protocol• Please arrive on time; lecture begins promptly

– I also promise to end on time

– Handouts, readings and homework templates posted class web page

• Questions are encouraged – If you don’t understand, ask; probably other students are struggling too

– Explain the content of your reading assignment, and the underlying reasoning, to the rest of the class

– Your reasons don't have to be "right” – you just have to be able to explain them

• There is no way to cover everything – If there is an interesting aspect that we do not cover in class, feel free to

incorporate that in your projects 36

Page 37: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

Grading Criteria• Straight scale: A≥90; B≥80; C≥70; D<70

– 50% Written paper critique and class discussion• 24 assignments x 2 points each + 2 points for this lecture

– 50% Projects• 30 points for group project, 10 points for pilot project, 10 points for project reviews

– 10% Subjective evaluation

• Expectations– Graduate students: you can explain the contributions and weaknesses of the

papers you read

– Undergraduates: you demonstrate a general understanding of the papers

• Unsatisfactory participation means:– You did not read the papers

– You did not produce a working implementation for your project, or you do not understand how the implementation works

37

Page 38: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

Review of Lecture• What did we learn?

– Data analytics provide real benefits

– Analyzing large data sets allows tackling long-standing hard problems

– Difference between security principles and security in practice

– Examples of security problems that require insights from large data sets

• I want to emphasize– This is systems course, not a not a pen-and-paper course

– You will be expected to build a real, working, data analysis tool

• What’s next?– Basic statistics and experimental design

– Pilot project: proposal, approach, expectations

• Deadline reminder – Post pilot project proposal on Piazza by Monday (soft deadline)

– First homework due on Sunday at 6 pm

38

Page 39: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

Dive Inhttp://ter.ps/759d

39