Forensic Framework Forensics Problemsnflaw/EIE4114Sem22018-19/part3s.pdf · Computational Forensics...

Machine Learning Forensics

Forensic Framework

2

Collection Identify and collect

digital evidence

selective acquisition?cloud storage?Generate data subset for

examination?

Examination of evidenceString search?Pattern matching?Data visualization (time-

line analysis)?Analysis

Forensic Framework

3

Data mining?cluster analysisdiscriminant analysisrule mining

Presentation

Analysisdetermine data significance and draw conclusion

Attribution: “Who did it?” (source)Authentication:

synthetic data?forgery?

Forensics Problems Data is evidence

Collect data from every sources (credit card transactions, cell call, email, chat, browser history, documents, data stored in db, …)

Web and wireless crimes (digital) Big volume, high velocity, heterogeneous in nature

Recognize patterns, analyze data: require huge manpower

4

Computational Forensics Computational methods to forensics

Large-scale investigations Large volumes of data from a wide range of

sources E.g., malware traces: identify patterns

Automation Simple triangulation method to estimate location

of an IP address within one-to-two minutes Analysis: machine learning

Machine Learning “gives computers the ability to learn without

being explicitly programmed” “provides systems the ability to automatically

learn and improve from experience” Process of learning begins with observations or

data (examples, direct experience) in order to look for patterns in data and make better decisions in the future based on the examples that are provided

6

Machine Learning Forensics Analyze vast amounts of data to discover

risk and to detect criminal behaviour(recognize patterns of criminal activities)

Seeks to learn from experience/data to predict future criminal behaviour (prevent digital crimes or real-time countermeasures in response)

7

Incorporate ML in Forensics

Analysis: use of machine learning to derive knowledge that addresses the purpose of the investigation

8

Incorporate ML in Forensics Reporting/Presentation

Describe the actions used, explain how tools and procedures are selected, determine what other actions need to be performed (e.g., examine additional data sources, attributes and variables, securing identified vulnerabilities, providing recommendations for improvement to policies, procedures, tools etc)

9

Example: Fraud Detection Step 1: understand the investigation objective

Know the requirements from a business or law enforcement perspectives

Convert into a forensic problem definition Draft a preliminary plan, outline benefits of the machines

learning approach Step 2: understand the data

Understand the fraud or crime that needs to be detected ensure appropriate data sets to be collected

Identify data quality problems and see how that will impact the results obtained

form hypotheses

10

Example: Fraud Detection Step 3: data preparation strategy

Data attribute selection Data cleaning/transformation: how to deal with missing values

Step 4: Forensic modeling Several approaches are possible for the same fraud detection Construction of multiple models/approaches to compare error

rates Used in co-operation?

Step 5: Investigation Evaluation Evaluate the results and review the steps used to construct

them High no of false positives?

Customer will get upset Business process may be delayed

11

Example: Fraud Detection Step 6: Detection Deployment

Any model is required to have a commitment to continuous learning and improvement, automated monitoring

Refresh the model to capture ever-changing characteristics of criminal avoidance

E.g., fraudsters’ patterns changed rapidly for a purely rules-based approach to be effective

12

Example https://www.ncr.com/financial-

services/enterprise-fraud-prevention/fractals

13 14

Terminology Machine Learning

Learn from past experiences: Past experiences are represented by the data

Methods Supervised

Data are labeled with pre-defined categories, give supervision

Unsupervised Data are unlabeled, want to identify patterns

15

Terminology Extractive Forensics

Goal: extract relationships, discover networks of associations, find key concepts from unstructured content

Link analysis, text mining Inductive Forensics

Clustering incidents and crimes Unsupervised learning: determine how the data are

organized Deductive Forensics

Decision tree, rule generators

16

17

Application example

A credit card company receives thousands of applications for new cards. Each application contains information about an applicant, age Marital status annual salary outstanding debts credit rating, …

Problem: to decide whether an application should be approved, or to classify applications into two categories, approved and not approved.

18

Application example

Machine Learning Forensics Earliest applications: employed by credit

card issuers Monitor and detect potential credit card

theft Dataset: contains both legal and illegal

transactions Learn from these transactions, make predictions

for “future transactions”

19 20

An example statement Data: Credit card application data Task: Predict whether an application

should be approved or not. Performance measure: accuracy.No learning: classify all future applications

(test data) to the majority class (i.e., Yes):

Accuracy = 9/15 = 60%. We can do better than 60% with learning.

Extractive ForensicsLink Analysis and Text Mining

Link Analysis Aim: uncover hidden associations

“who knew whom, where and when” Initial type of analysis

analyze cell phone calls (numbers that have been dialed), emails, text messages between suspects and associates, transactions during a given time frame

Circles/nodes: individuals/companies Link (edges): convey strengths of relationships

The stronger the link, the thicker the line connecting them Provides a graphical network displaying crucial relationships Hope: simplifies and narrows the scope of investigation

Search for outliers, understand known patterns, discover new patterns

22

Link Analysis

23

Example (Handout) A list of suspects: {Able, Baker,

Charley, David, Edwards and Frank} Participated in various activities Baker was involved in “weapons theft”

and “bombing” Charley was involved in “weapons theft” Edwards was in “bombing”

Example Case Study: major drug case

https://www.youtube.com/watch?v=FzmrLDHXJ50

Visualizing call data record https://www.youtube.com/watch?v=J38tKqq9kpY

Analyze criminal network using Link Analysis https://www.youtube.com/watch?v=UYdXOXpT9wM

25

Link Analysis Two roles for investigators

Enables visualization of relationship Lead to the discovery of different types of node

associations Weakness:

Requires human interpretation Example usage:

Dept of homeland security: use link analysis to create networks of associations in the travelers’ screenings

26

Self-study Analyst’s Notebook

https://youtu.be/EIFu_oUiaBY https://www.youtube.com/watch?v=FdXlZ95xF40

NetMiner: http://www.netminer.com/product/features.do Demo: http://www.netminer.com/product/demo.do#analy Education copy

MarketVisual: provides online visual relationship mapping. This website can discover relationships of companies or persons under investigation (http://www.marketvisual.com/)

27

Case Study (ex 1 and ex 2)

Text Mining Aim: sorting and organizing massive amounts of

unstructured info Documents, notes, emails, chat, web forms, voice

mail, records, spreadsheets, presentations, invoice, blogs, … Extract names, email addresses, IP addresses, …

Company has nearly 80% of their data in unstructured formats

Applications: text categorization, text clustering, concept extraction, sentiment analysis, document summarization

28

Text categorization Problem: label text with one or more

category labels Collect some samples of documents for

each of the categories Train a classifier Run the classifier on new data The classifier will return a label

29

Example

30

Example Customer experience management

Use text analysis software to “listen, analyze, relate and act on millions of customer conversations”

Whirlpool: microwave oven instance: machines were arching and producing electrical sparks cause food inside to smoke

Keyword searching: “arching”, “smoke” Found 18500 records Form a team to further analyze these records Made 700 calls to customers who may have the

problem

31

Case Study (ex 3 and ex 4)

Inductive ForensicsClustering of crimes and

incidents

Clustering Forensic process of decomposing a set of

observations into several subsets that have some similarities Seeks to determine how data are organized

33

Real-life examples Example 1: group people of similar sizes:

“small”, “medium”, “large” T-shirts Tailor-made for each person: too expensive One-size fits all: does not fit all

Example 2: marking, segment customers according to their similarities Targeted marketing

34

Real-life examples Example 3:

Insurance: detect groups of people who stage accidents to collect insurance

Money laundering: detect suspicious money transactions

Telecom industry: find calling patterns that deviate from a norm

35

Clustering Clustering is performed autonomously by the program

directly from the data Exploratory type of analysis

Isolate anomalies in the data (questionable financial transactions, suspicious use of a computer port)

Conducting inductive forensics investigations Identify the source of words/behaviors you want to cluster

(e.g., suspect names, phone numbers, IP addresses) Build and train a cluster of words: use software to let words

organize themselves into groups Evaluate the accuracy against new words Exam the clusters that have revealed

36

Step 1 Consider two clusters (k=2) Randomly choose two centroids

Step 2: distance

22.915476-2.5-24.3011633.52.54.53.5722.061553-2-15.31507343.554.5622.5-2-24.71699142.553.5520007.2111036475413.605551-3-23.6055513243316.103278-5-41.11803410.521.5217.211103-6-4000111

7511v2v1

Group 2Group 1

Step 3 2 groups: {1,2,3} and {4,5,6,7} New centroids: Group 1:

Variable 1 = (1+1.5+3)/3 = 1.83 Variable 2 = (1+2+4)/3 = 2.33

Group 2: Variable 1 = (5+3.5+4.5+3.5)/4 = 4.12 Variable 2= (7+5+5+4.5)/4=5.38

Iteration: use new centroids

4.53.57

54.56

53.55

754

433

21.52

111

v2v1

Group 2Group 1

Step 4 2 groups: New centroids: Group 1:

Variable 1 = Variable 2 =

Group 2: Variable 1 = Variable 2=

Iteration: use new centroids no change, stop

20.72111-0.6-0.43.7532.254.53.57

20.608276-0.10.64.7762433.53.2554.56

20.412311-0.1-0.44.1608293.52.2553.55

22.195451.91.16.6567635.53.75754

21.421267-1.1-0.93.0516392.51.75433

13.920459-3.1-2.40.5590170.50.2521.52

15.021952-4.1-2.90.559017-0.5-0.3111

5.13.91.51.25v2v1

Group 2Group 1

Clustering

Demonstration https://www.viscovery.net/demos/click-

stream-analysis

45

Case Study (ex 5)

Health Care Fraud detection: Problem

High volume of data

46

Framework

47

Clustering: Unsupervised search for anomalous patterns

48

Clustering: Unsupervised search for anomalous patterns

49

Deductive ForensicsDecision Tree, rule generators

Definition Models historical data so as to predict

criminal outcomes or fraudulent events Identify when and where criminal activity will

take place and in what format Uncover problems before they occur

Example 1: fraud detection Identify anomalies in credit card activity or

insurance claims activity so that fraud are detected early and losses are minimized

51

Example Example 2:

From historical data: discover that most car thefts occur between midnight and 2am on week nights in hotel parking lots

Solution: frequent hotel parking lots monitoring more often between midnight and 2am on week nights.

52

Decision Tree Case: a credit card company receives thousands

of applications for new cards (or a company received applications for loan) Each application contains info such as

Age, marital status, annual salary, outstanding debts, credit rating, …

Problem: decide whether an application should be approved or not

Outcome: “approved” and “not approved”

53 54

Problem Learn a model from the “historical” data Use the model for prediction What will be the output?

55

Decision Tree Learning Widely used techniques

56

Tree Design Want small, but accurate tree Hot research topic Main idea:

Reduce uncertainty Subset of data:

Belong to the same class

57

Tree Design

(B) is better Entropy: measure of information gain

58

Tree Design Step 1: calculate the entropy of the outcome:

2 classes: Yes and No Prob(Yes) = 9/15, Prob(No) = 6/15 Entropy(class) =

4 attributes: Age, Has_job, Own_House, Credit_rating

Measure information gain for each attribute Entropy_{age}(D) Entropy_{has_job}(D)= Entropy_{Own-house}(D) Entroy_{credit_rating}(D)

59

2 26 6 9 9log log 0.97115 15 15 15

Tree Design Entropy_{age}(D) Age: young, middle, old

Entropy(young) = Entropy(middle) = 0.971 Entropy(old) =

Entropy_{age}(D)=

2 22 2 3 3log log 0.9715 5 5 5

60

Age Yes No entropy(Di)young 2 3 0.971middle 3 2 0.971old 4 1 0.722

2 24 4 1 1log log 0.7225 5 5 5

5 5 50.971 0.971 0.772 0.88815 15 15

Tree Design Entropy_{has_job}(D) Has_job: True, false

Entropy(TRUE) = Entropy(FALSE) =

Entropy_{has_job}(D)=

61

has_job Yes No entropy(Di)TRUEFALSE

Tree Design Entropy_{own_house}(D) Own_house: True, false

Entropy(TRUE) = Entropy(FALSE) =

Entropy_{own_house}(D)=

62

house Yes No entropy(Di)TRUEFALSE

Tree Design Entropy_{credit_rating}(D) credit_rating : fair, good, excellent

Entropy(fair) Entropy(good) Entropy(excellent)

Entropy_{credit_rating}(D)=

63

Age Yes No entropy(Di)fairgoodexcellent

Tree Design: info gain

Entropy(class) = 0.971 Entropy_{age}(D)=0.888

Info gain from “Age” = 0.971 – 0.888 = 0.083 Entropy_{has_job}(D)=0.647

Info gain from “job” = 0.324 Entropy_{own_house}(D)=0.551

Info gain from “house” = 0.42 Entropy_{credit_rating}(D)=0.608

Info gain from “credit” = 0.363

64

own-_house!

Case Study (Violent Crime –Example 6)

Crimes Red: an arrest for a violent crime Yellow: an arrest for a crime that is not violent Green: no arrest

Factors: Age No of prior arrests

Can use for prediction: cases with unknown outcomes

65 66

67

Case Study (Predicting Crime – Example 7)

Crime records from 2005-2015 Offence, no of male students, no of female

students, no of 100 level students, no of 200 level students, no of 300 level students and no of 400+ level students Programme, sex, offence, expulsion period and

level

68

69

Summary Data: 3Vs: volume, velocity, variety Machine learning approach:

Investigate organized crimes, reveal hidden network structures among criminals/their roles/detecting subgroups

Develop preventive measures to prevent crimes from taking place

Link Analysis: discover knowledge from connections/relationships, characterize relationships, identify group/subgroups Facilitate crime investigation, social network investigation Human intervention

70

Summary Text mining

Discover knowledge from textual data (keyword extraction)

Handling the velocity of big textual data Clustering & Decision tree

Discover patterns from historical data Supervised: data are labeled

Used for prediction Unsupervised: data are unlabeled

Extract interesting pattern

71

Forensic Framework Forensics Problemsnflaw/EIE4114Sem22018-19/part3s.pdf · Computational Forensics...

Documents

Transcript of Forensic Framework Forensics Problemsnflaw/EIE4114Sem22018-19/part3s.pdf · Computational Forensics...