Computational techniques for spatial logistic regression with large
Forensic Framework Forensics Problemsnflaw/EIE4114Sem22018-19/part3s.pdf · Computational Forensics...
Transcript of Forensic Framework Forensics Problemsnflaw/EIE4114Sem22018-19/part3s.pdf · Computational Forensics...
Machine Learning Forensics
Forensic Framework
2
Collection Identify and collect
digital evidence
selective acquisition?cloud storage?Generate data subset for
examination?
Examination of evidenceString search?Pattern matching?Data visualization (time-
line analysis)?Analysis
Forensic Framework
3
Data mining?cluster analysisdiscriminant analysisrule mining
Presentation
Analysisdetermine data significance and draw conclusion
Attribution: “Who did it?” (source)Authentication:
synthetic data?forgery?
Forensics Problems Data is evidence
Collect data from every sources (credit card transactions, cell call, email, chat, browser history, documents, data stored in db, …)
Web and wireless crimes (digital) Big volume, high velocity, heterogeneous in nature
Recognize patterns, analyze data: require huge manpower
4
Computational Forensics Computational methods to forensics
Large-scale investigations Large volumes of data from a wide range of
sources E.g., malware traces: identify patterns
Automation Simple triangulation method to estimate location
of an IP address within one-to-two minutes Analysis: machine learning
Machine Learning “gives computers the ability to learn without
being explicitly programmed” “provides systems the ability to automatically
learn and improve from experience” Process of learning begins with observations or
data (examples, direct experience) in order to look for patterns in data and make better decisions in the future based on the examples that are provided
6
Machine Learning Forensics Analyze vast amounts of data to discover
risk and to detect criminal behaviour(recognize patterns of criminal activities)
Seeks to learn from experience/data to predict future criminal behaviour (prevent digital crimes or real-time countermeasures in response)
7
Incorporate ML in Forensics
Analysis: use of machine learning to derive knowledge that addresses the purpose of the investigation
8
Incorporate ML in Forensics Reporting/Presentation
Describe the actions used, explain how tools and procedures are selected, determine what other actions need to be performed (e.g., examine additional data sources, attributes and variables, securing identified vulnerabilities, providing recommendations for improvement to policies, procedures, tools etc)
9
Example: Fraud Detection Step 1: understand the investigation objective
Know the requirements from a business or law enforcement perspectives
Convert into a forensic problem definition Draft a preliminary plan, outline benefits of the machines
learning approach Step 2: understand the data
Understand the fraud or crime that needs to be detected ensure appropriate data sets to be collected
Identify data quality problems and see how that will impact the results obtained
form hypotheses
10
Example: Fraud Detection Step 3: data preparation strategy
Data attribute selection Data cleaning/transformation: how to deal with missing values
Step 4: Forensic modeling Several approaches are possible for the same fraud detection Construction of multiple models/approaches to compare error
rates Used in co-operation?
Step 5: Investigation Evaluation Evaluate the results and review the steps used to construct
them High no of false positives?
Customer will get upset Business process may be delayed
11
Example: Fraud Detection Step 6: Detection Deployment
Any model is required to have a commitment to continuous learning and improvement, automated monitoring
Refresh the model to capture ever-changing characteristics of criminal avoidance
E.g., fraudsters’ patterns changed rapidly for a purely rules-based approach to be effective
12
Example https://www.ncr.com/financial-
services/enterprise-fraud-prevention/fractals
13 14
Terminology Machine Learning
Learn from past experiences: Past experiences are represented by the data
Methods Supervised
Data are labeled with pre-defined categories, give supervision
Unsupervised Data are unlabeled, want to identify patterns
15
Terminology Extractive Forensics
Goal: extract relationships, discover networks of associations, find key concepts from unstructured content
Link analysis, text mining Inductive Forensics
Clustering incidents and crimes Unsupervised learning: determine how the data are
organized Deductive Forensics
Decision tree, rule generators
16
17
Application example
A credit card company receives thousands of applications for new cards. Each application contains information about an applicant, age Marital status annual salary outstanding debts credit rating, …
Problem: to decide whether an application should be approved, or to classify applications into two categories, approved and not approved.
18
Application example
Machine Learning Forensics Earliest applications: employed by credit
card issuers Monitor and detect potential credit card
theft Dataset: contains both legal and illegal
transactions Learn from these transactions, make predictions
for “future transactions”
19 20
An example statement Data: Credit card application data Task: Predict whether an application
should be approved or not. Performance measure: accuracy.No learning: classify all future applications
(test data) to the majority class (i.e., Yes):
Accuracy = 9/15 = 60%. We can do better than 60% with learning.
Extractive ForensicsLink Analysis and Text Mining
Link Analysis Aim: uncover hidden associations
“who knew whom, where and when” Initial type of analysis
analyze cell phone calls (numbers that have been dialed), emails, text messages between suspects and associates, transactions during a given time frame
Circles/nodes: individuals/companies Link (edges): convey strengths of relationships
The stronger the link, the thicker the line connecting them Provides a graphical network displaying crucial relationships Hope: simplifies and narrows the scope of investigation
Search for outliers, understand known patterns, discover new patterns
22
Link Analysis
23
Example (Handout) A list of suspects: {Able, Baker,
Charley, David, Edwards and Frank} Participated in various activities Baker was involved in “weapons theft”
and “bombing” Charley was involved in “weapons theft” Edwards was in “bombing”
Example Case Study: major drug case
https://www.youtube.com/watch?v=FzmrLDHXJ50
Visualizing call data record https://www.youtube.com/watch?v=J38tKqq9kpY
Analyze criminal network using Link Analysis https://www.youtube.com/watch?v=UYdXOXpT9wM
25
Link Analysis Two roles for investigators
Enables visualization of relationship Lead to the discovery of different types of node
associations Weakness:
Requires human interpretation Example usage:
Dept of homeland security: use link analysis to create networks of associations in the travelers’ screenings
26
Self-study Analyst’s Notebook
https://youtu.be/EIFu_oUiaBY https://www.youtube.com/watch?v=FdXlZ95xF40
NetMiner: http://www.netminer.com/product/features.do Demo: http://www.netminer.com/product/demo.do#analy Education copy
MarketVisual: provides online visual relationship mapping. This website can discover relationships of companies or persons under investigation (http://www.marketvisual.com/)
27
Case Study (ex 1 and ex 2)
Text Mining Aim: sorting and organizing massive amounts of
unstructured info Documents, notes, emails, chat, web forms, voice
mail, records, spreadsheets, presentations, invoice, blogs, … Extract names, email addresses, IP addresses, …
Company has nearly 80% of their data in unstructured formats
Applications: text categorization, text clustering, concept extraction, sentiment analysis, document summarization
28
Text categorization Problem: label text with one or more
category labels Collect some samples of documents for
each of the categories Train a classifier Run the classifier on new data The classifier will return a label
29
Example
30
Example Customer experience management
Use text analysis software to “listen, analyze, relate and act on millions of customer conversations”
Whirlpool: microwave oven instance: machines were arching and producing electrical sparks cause food inside to smoke
Keyword searching: “arching”, “smoke” Found 18500 records Form a team to further analyze these records Made 700 calls to customers who may have the
problem
31
Case Study (ex 3 and ex 4)
Inductive ForensicsClustering of crimes and
incidents
Clustering Forensic process of decomposing a set of
observations into several subsets that have some similarities Seeks to determine how data are organized
33
Real-life examples Example 1: group people of similar sizes:
“small”, “medium”, “large” T-shirts Tailor-made for each person: too expensive One-size fits all: does not fit all
Example 2: marking, segment customers according to their similarities Targeted marketing
34
Real-life examples Example 3:
Insurance: detect groups of people who stage accidents to collect insurance
Money laundering: detect suspicious money transactions
Telecom industry: find calling patterns that deviate from a norm
35
Clustering Clustering is performed autonomously by the program
directly from the data Exploratory type of analysis
Isolate anomalies in the data (questionable financial transactions, suspicious use of a computer port)
Conducting inductive forensics investigations Identify the source of words/behaviors you want to cluster
(e.g., suspect names, phone numbers, IP addresses) Build and train a cluster of words: use software to let words
organize themselves into groups Evaluate the accuracy against new words Exam the clusters that have revealed
36
Step 1 Consider two clusters (k=2) Randomly choose two centroids
Step 2: distance
22.915476-2.5-24.3011633.52.54.53.5722.061553-2-15.31507343.554.5622.5-2-24.71699142.553.5520007.2111036475413.605551-3-23.6055513243316.103278-5-41.11803410.521.5217.211103-6-4000111
7511v2v1
Group 2Group 1
Step 3 2 groups: {1,2,3} and {4,5,6,7} New centroids: Group 1:
Variable 1 = (1+1.5+3)/3 = 1.83 Variable 2 = (1+2+4)/3 = 2.33
Group 2: Variable 1 = (5+3.5+4.5+3.5)/4 = 4.12 Variable 2= (7+5+5+4.5)/4=5.38
Iteration: use new centroids
4.53.57
54.56
53.55
754
433
21.52
111
v2v1
Group 2Group 1
Step 4 2 groups: New centroids: Group 1:
Variable 1 = Variable 2 =
Group 2: Variable 1 = Variable 2=
Iteration: use new centroids no change, stop
20.72111-0.6-0.43.7532.254.53.57
20.608276-0.10.64.7762433.53.2554.56
20.412311-0.1-0.44.1608293.52.2553.55
22.195451.91.16.6567635.53.75754
21.421267-1.1-0.93.0516392.51.75433
13.920459-3.1-2.40.5590170.50.2521.52
15.021952-4.1-2.90.559017-0.5-0.3111
5.13.91.51.25v2v1
Group 2Group 1
Clustering
Demonstration https://www.viscovery.net/demos/click-
stream-analysis
45
Case Study (ex 5)
Health Care Fraud detection: Problem
High volume of data
46
Framework
47
Clustering: Unsupervised search for anomalous patterns
48
Clustering: Unsupervised search for anomalous patterns
49
Deductive ForensicsDecision Tree, rule generators
Definition Models historical data so as to predict
criminal outcomes or fraudulent events Identify when and where criminal activity will
take place and in what format Uncover problems before they occur
Example 1: fraud detection Identify anomalies in credit card activity or
insurance claims activity so that fraud are detected early and losses are minimized
51
Example Example 2:
From historical data: discover that most car thefts occur between midnight and 2am on week nights in hotel parking lots
Solution: frequent hotel parking lots monitoring more often between midnight and 2am on week nights.
52
Decision Tree Case: a credit card company receives thousands
of applications for new cards (or a company received applications for loan) Each application contains info such as
Age, marital status, annual salary, outstanding debts, credit rating, …
Problem: decide whether an application should be approved or not
Outcome: “approved” and “not approved”
53 54
Problem Learn a model from the “historical” data Use the model for prediction What will be the output?
55
Decision Tree Learning Widely used techniques
56
Tree Design Want small, but accurate tree Hot research topic Main idea:
Reduce uncertainty Subset of data:
Belong to the same class
57
Tree Design
(B) is better Entropy: measure of information gain
58
Tree Design Step 1: calculate the entropy of the outcome:
2 classes: Yes and No Prob(Yes) = 9/15, Prob(No) = 6/15 Entropy(class) =
4 attributes: Age, Has_job, Own_House, Credit_rating
Measure information gain for each attribute Entropy_{age}(D) Entropy_{has_job}(D)= Entropy_{Own-house}(D) Entroy_{credit_rating}(D)
59
2 26 6 9 9log log 0.97115 15 15 15
Tree Design Entropy_{age}(D) Age: young, middle, old
Entropy(young) = Entropy(middle) = 0.971 Entropy(old) =
Entropy_{age}(D)=
2 22 2 3 3log log 0.9715 5 5 5
60
Age Yes No entropy(Di)young 2 3 0.971middle 3 2 0.971old 4 1 0.722
2 24 4 1 1log log 0.7225 5 5 5
5 5 50.971 0.971 0.772 0.88815 15 15
Tree Design Entropy_{has_job}(D) Has_job: True, false
Entropy(TRUE) = Entropy(FALSE) =
Entropy_{has_job}(D)=
61
has_job Yes No entropy(Di)TRUEFALSE
Tree Design Entropy_{own_house}(D) Own_house: True, false
Entropy(TRUE) = Entropy(FALSE) =
Entropy_{own_house}(D)=
62
house Yes No entropy(Di)TRUEFALSE
Tree Design Entropy_{credit_rating}(D) credit_rating : fair, good, excellent
Entropy(fair) Entropy(good) Entropy(excellent)
Entropy_{credit_rating}(D)=
63
Age Yes No entropy(Di)fairgoodexcellent
Tree Design: info gain
Entropy(class) = 0.971 Entropy_{age}(D)=0.888
Info gain from “Age” = 0.971 – 0.888 = 0.083 Entropy_{has_job}(D)=0.647
Info gain from “job” = 0.324 Entropy_{own_house}(D)=0.551
Info gain from “house” = 0.42 Entropy_{credit_rating}(D)=0.608
Info gain from “credit” = 0.363
64
own-_house!
Case Study (Violent Crime –Example 6)
Crimes Red: an arrest for a violent crime Yellow: an arrest for a crime that is not violent Green: no arrest
Factors: Age No of prior arrests
Can use for prediction: cases with unknown outcomes
65 66
67
Case Study (Predicting Crime – Example 7)
Crime records from 2005-2015 Offence, no of male students, no of female
students, no of 100 level students, no of 200 level students, no of 300 level students and no of 400+ level students Programme, sex, offence, expulsion period and
level
68
69
Summary Data: 3Vs: volume, velocity, variety Machine learning approach:
Investigate organized crimes, reveal hidden network structures among criminals/their roles/detecting subgroups
Develop preventive measures to prevent crimes from taking place
Link Analysis: discover knowledge from connections/relationships, characterize relationships, identify group/subgroups Facilitate crime investigation, social network investigation Human intervention
70
Summary Text mining
Discover knowledge from textual data (keyword extraction)
Handling the velocity of big textual data Clustering & Decision tree
Discover patterns from historical data Supervised: data are labeled
Used for prediction Unsupervised: data are unlabeled
Extract interesting pattern
71