Machine*Learning,* Entropy*and*Fraud*in* Splunk* · Machine*Learning*Goal*...
Transcript of Machine*Learning,* Entropy*and*Fraud*in* Splunk* · Machine*Learning*Goal*...
![Page 1: Machine*Learning,* Entropy*and*Fraud*in* Splunk* · Machine*Learning*Goal* Program*computers*to*use*example*dataor*past experience*to*solve*agiven*problem*](https://reader033.fdocuments.in/reader033/viewer/2022042123/5e9f249b1b57706c427bde6e/html5/thumbnails/1.jpg)
Copyright © 2014 Splunk Inc.
Fred Wilmot (CISSP) Director, Global Security PracEce
SebasEen Tricaud Principal Strategist, Global Security PracEce
Machine Learning, Entropy and Fraud in
Splunk
![Page 2: Machine*Learning,* Entropy*and*Fraud*in* Splunk* · Machine*Learning*Goal* Program*computers*to*use*example*dataor*past experience*to*solve*agiven*problem*](https://reader033.fdocuments.in/reader033/viewer/2022042123/5e9f249b1b57706c427bde6e/html5/thumbnails/2.jpg)
Disclaimer
2
During the course of this presentaEon, we may make forward looking statements regarding future events or the expected performance of the company. We cauEon you that such statements reflect our current expectaEons and
esEmates based on factors currently known to us and that actual events or results could differ materially. For important factors that may cause actual results to differ from those contained in our forward-‐looking statements,
please review our filings with the SEC. The forward-‐looking statements made in the this presentaEon are being made as of the Eme and date of its live presentaEon. If reviewed aSer its live presentaEon, this presentaEon may not contain current or accurate informaEon. We do not assume any obligaEon to update any forward looking statements we may make. In addiEon, any informaEon about our roadmap outlines our general product direcEon and is subject to change at any Eme without noEce. It is for informaEonal purposes only and shall not, be incorporated into any contract or other commitment. Splunk undertakes no obligaEon either to develop the features or funcEonality described or to
include any such feature or funcEonality in a future release.
![Page 3: Machine*Learning,* Entropy*and*Fraud*in* Splunk* · Machine*Learning*Goal* Program*computers*to*use*example*dataor*past experience*to*solve*agiven*problem*](https://reader033.fdocuments.in/reader033/viewer/2022042123/5e9f249b1b57706c427bde6e/html5/thumbnails/3.jpg)
Agenda
! What is Machine Learning? ! Use cases ! Results ! Lessons learned
3
![Page 4: Machine*Learning,* Entropy*and*Fraud*in* Splunk* · Machine*Learning*Goal* Program*computers*to*use*example*dataor*past experience*to*solve*agiven*problem*](https://reader033.fdocuments.in/reader033/viewer/2022042123/5e9f249b1b57706c427bde6e/html5/thumbnails/4.jpg)
WARNING
4
Do not visit URLs in this presentaEon, they will make your computer sick!
![Page 5: Machine*Learning,* Entropy*and*Fraud*in* Splunk* · Machine*Learning*Goal* Program*computers*to*use*example*dataor*past experience*to*solve*agiven*problem*](https://reader033.fdocuments.in/reader033/viewer/2022042123/5e9f249b1b57706c427bde6e/html5/thumbnails/5.jpg)
Machine Learning Goal
Program computers to use example data or past experience to solve a given problem
![Page 6: Machine*Learning,* Entropy*and*Fraud*in* Splunk* · Machine*Learning*Goal* Program*computers*to*use*example*dataor*past experience*to*solve*agiven*problem*](https://reader033.fdocuments.in/reader033/viewer/2022042123/5e9f249b1b57706c427bde6e/html5/thumbnails/6.jpg)
Some Machine Learning Use Cases
6
! User behavior profiling and base-‐lining ! Asset and applicaEon modeling ! Finding New Security Threats
– SQLi – Network proxy/DNS/evaluaEon – SenEment from SLA (semanEc language analysis) – ExfiltraEon – C2 channels / Malware
! Fraud
![Page 7: Machine*Learning,* Entropy*and*Fraud*in* Splunk* · Machine*Learning*Goal* Program*computers*to*use*example*dataor*past experience*to*solve*agiven*problem*](https://reader033.fdocuments.in/reader033/viewer/2022042123/5e9f249b1b57706c427bde6e/html5/thumbnails/7.jpg)
Master Machine Learning in 2 slides!
![Page 8: Machine*Learning,* Entropy*and*Fraud*in* Splunk* · Machine*Learning*Goal* Program*computers*to*use*example*dataor*past experience*to*solve*agiven*problem*](https://reader033.fdocuments.in/reader033/viewer/2022042123/5e9f249b1b57706c427bde6e/html5/thumbnails/8.jpg)
Machine that Learns
Algorithms: types of learning
Input Vectors
Outputs
Training Regimes Noise Performance EvaluaEon
![Page 9: Machine*Learning,* Entropy*and*Fraud*in* Splunk* · Machine*Learning*Goal* Program*computers*to*use*example*dataor*past experience*to*solve*agiven*problem*](https://reader033.fdocuments.in/reader033/viewer/2022042123/5e9f249b1b57706c427bde6e/html5/thumbnails/9.jpg)
Learn – Classify -‐ Cluster
9
! Learning: – Is “Subject: Fais grandir ton machin” a spam? – Is “jet-‐machinery.com” a valid url? – Store what we know in a good or bad dataset
! Classify (supervised/semi-‐supervised learning): – Based on a learning, tries to put things in the good or bad dataset and re-‐
evaluates model.
! Cluster (non-‐supervised learning): – Group objects in a geometrical space
![Page 10: Machine*Learning,* Entropy*and*Fraud*in* Splunk* · Machine*Learning*Goal* Program*computers*to*use*example*dataor*past experience*to*solve*agiven*problem*](https://reader033.fdocuments.in/reader033/viewer/2022042123/5e9f249b1b57706c427bde6e/html5/thumbnails/10.jpg)
Use Cases
![Page 11: Machine*Learning,* Entropy*and*Fraud*in* Splunk* · Machine*Learning*Goal* Program*computers*to*use*example*dataor*past experience*to*solve*agiven*problem*](https://reader033.fdocuments.in/reader033/viewer/2022042123/5e9f249b1b57706c427bde6e/html5/thumbnails/11.jpg)
Use Cases
Domain analysis for threat detecEon
SQL InjecEon agack detecEon
Web based financial fraud
11
![Page 12: Machine*Learning,* Entropy*and*Fraud*in* Splunk* · Machine*Learning*Goal* Program*computers*to*use*example*dataor*past experience*to*solve*agiven*problem*](https://reader033.fdocuments.in/reader033/viewer/2022042123/5e9f249b1b57706c427bde6e/html5/thumbnails/12.jpg)
Use case: Threat detecEon via Domain Analysis
! www.google.com
! www.g0ogle.com
12
Known good URL
Really close to known good URL… probably malicious!
![Page 13: Machine*Learning,* Entropy*and*Fraud*in* Splunk* · Machine*Learning*Goal* Program*computers*to*use*example*dataor*past experience*to*solve*agiven*problem*](https://reader033.fdocuments.in/reader033/viewer/2022042123/5e9f249b1b57706c427bde6e/html5/thumbnails/13.jpg)
Use case: Threat detecEon via URL Analysis
! www.google.com
! www.g0ogle.com
13
Known good URL
Really close to known good URL… probably malicious!
![Page 14: Machine*Learning,* Entropy*and*Fraud*in* Splunk* · Machine*Learning*Goal* Program*computers*to*use*example*dataor*past experience*to*solve*agiven*problem*](https://reader033.fdocuments.in/reader033/viewer/2022042123/5e9f249b1b57706c427bde6e/html5/thumbnails/14.jpg)
Accelerate your HunEng Shannon!
URLs from web logs and email
ML: Levenstein Distance and
Shannon Entropy Anomalous
URLs
14
![Page 15: Machine*Learning,* Entropy*and*Fraud*in* Splunk* · Machine*Learning*Goal* Program*computers*to*use*example*dataor*past experience*to*solve*agiven*problem*](https://reader033.fdocuments.in/reader033/viewer/2022042123/5e9f249b1b57706c427bde6e/html5/thumbnails/15.jpg)
Working with Data
15
! #1 rule: be sure ingest the data properly – ‘CIM’ the data – Make sure fields are extracted – Make sure sourcetyped appropriately #2 rule: make sure you understand your data’s context #3 rule: choose an algorithm you understand, to evaluate the data #4 rule: have a general idea of what your outcome should be
! #4 rule: see #1 rule
Example: how to get the entropy of a subdomain properly? Consume/extract URLs è Apply Shannon Entropy èvalidate with results
![Page 16: Machine*Learning,* Entropy*and*Fraud*in* Splunk* · Machine*Learning*Goal* Program*computers*to*use*example*dataor*past experience*to*solve*agiven*problem*](https://reader033.fdocuments.in/reader033/viewer/2022042123/5e9f249b1b57706c427bde6e/html5/thumbnails/16.jpg)
DetecEng the No.1 Programming Error
16
![Page 17: Machine*Learning,* Entropy*and*Fraud*in* Splunk* · Machine*Learning*Goal* Program*computers*to*use*example*dataor*past experience*to*solve*agiven*problem*](https://reader033.fdocuments.in/reader033/viewer/2022042123/5e9f249b1b57706c427bde6e/html5/thumbnails/17.jpg)
DetecEng SQLi
17
Web proxy logs Web access logs
StochasEc gradient descent -‐ bayesian, naive bayesian and
bag of words
92% True posiEve
![Page 18: Machine*Learning,* Entropy*and*Fraud*in* Splunk* · Machine*Learning*Goal* Program*computers*to*use*example*dataor*past experience*to*solve*agiven*problem*](https://reader033.fdocuments.in/reader033/viewer/2022042123/5e9f249b1b57706c427bde6e/html5/thumbnails/18.jpg)
Why is Fraud detecEon so slow?
18
AuthenEcated transacEons are
well… authenEcated L
Slight variaEons in user behavior are hard to detect
Manual processes require mulEple
people
![Page 19: Machine*Learning,* Entropy*and*Fraud*in* Splunk* · Machine*Learning*Goal* Program*computers*to*use*example*dataor*past experience*to*solve*agiven*problem*](https://reader033.fdocuments.in/reader033/viewer/2022042123/5e9f249b1b57706c427bde6e/html5/thumbnails/19.jpg)
Math saves Bank$
19
Web logs with session keys, screen res, user
name
Randomness of the key sizes and the n-‐grams of keys -‐ clustering to find
outlier
Discover hijacked, proxied sessions
![Page 21: Machine*Learning,* Entropy*and*Fraud*in* Splunk* · Machine*Learning*Goal* Program*computers*to*use*example*dataor*past experience*to*solve*agiven*problem*](https://reader033.fdocuments.in/reader033/viewer/2022042123/5e9f249b1b57706c427bde6e/html5/thumbnails/21.jpg)
So how does all this work??
![Page 22: Machine*Learning,* Entropy*and*Fraud*in* Splunk* · Machine*Learning*Goal* Program*computers*to*use*example*dataor*past experience*to*solve*agiven*problem*](https://reader033.fdocuments.in/reader033/viewer/2022042123/5e9f249b1b57706c427bde6e/html5/thumbnails/22.jpg)
Short answer…
You install a couple of apps and train the models for a bit… and that’s its
![Page 23: Machine*Learning,* Entropy*and*Fraud*in* Splunk* · Machine*Learning*Goal* Program*computers*to*use*example*dataor*past experience*to*solve*agiven*problem*](https://reader033.fdocuments.in/reader033/viewer/2022042123/5e9f249b1b57706c427bde6e/html5/thumbnails/23.jpg)
No really, whats under the hood ?
23
Aah…
![Page 24: Machine*Learning,* Entropy*and*Fraud*in* Splunk* · Machine*Learning*Goal* Program*computers*to*use*example*dataor*past experience*to*solve*agiven*problem*](https://reader033.fdocuments.in/reader033/viewer/2022042123/5e9f249b1b57706c427bde6e/html5/thumbnails/24.jpg)
Our Data Journey: ML ExploraEon Scope
AssumpEons QuesEons
• How much data will this evaluaEon require?
• What kind of data can we apply our learning to?
• What data sources will we need to work with to get a valuable result?
• Can we understand good/bad using algorithms?
• Scaled Test infrastructure • High-‐quality data • Machine learning funcEons wrigen in Splunk
• Our approach will get results • IteraEon and collaboraEon on training sets
![Page 25: Machine*Learning,* Entropy*and*Fraud*in* Splunk* · Machine*Learning*Goal* Program*computers*to*use*example*dataor*past experience*to*solve*agiven*problem*](https://reader033.fdocuments.in/reader033/viewer/2022042123/5e9f249b1b57706c427bde6e/html5/thumbnails/25.jpg)
Splunk + ML Flow
25
Data Label + Data Index Lable+Data Search
Machine Learning Framework
(Results+Tag) + ML
K/V Stores results
![Page 26: Machine*Learning,* Entropy*and*Fraud*in* Splunk* · Machine*Learning*Goal* Program*computers*to*use*example*dataor*past experience*to*solve*agiven*problem*](https://reader033.fdocuments.in/reader033/viewer/2022042123/5e9f249b1b57706c427bde6e/html5/thumbnails/26.jpg)
Design Decisions
26
! Search Eme? ! Index Eme? ! Data stores and choices? ! How would we relate calculated values at search Eme, back to raw data at ingest Eme?
! Do we have reference data? ! Batch or near-‐real-‐Eme ML evaluaEon?
! We made two different choices-‐ Index Eme and search Eme ML for tesEng.
![Page 27: Machine*Learning,* Entropy*and*Fraud*in* Splunk* · Machine*Learning*Goal* Program*computers*to*use*example*dataor*past experience*to*solve*agiven*problem*](https://reader033.fdocuments.in/reader033/viewer/2022042123/5e9f249b1b57706c427bde6e/html5/thumbnails/27.jpg)
Index Eme requirements
27
! We need a unique idenEfier for each event-‐ or we can’t relate features evaluated back to the raw data.
![Page 28: Machine*Learning,* Entropy*and*Fraud*in* Splunk* · Machine*Learning*Goal* Program*computers*to*use*example*dataor*past experience*to*solve*agiven*problem*](https://reader033.fdocuments.in/reader033/viewer/2022042123/5e9f249b1b57706c427bde6e/html5/thumbnails/28.jpg)
Machine Learning IteraEon and Algorithms
Tools Requirements
• KV store for labels and raw data • Methodology for interchangeable
algorithms interacEng with KV store
• IteraEve, scalable method for creaEng a reference data set
• Ability to label data, and operate on it.
• MLSET/MLGET • Levenshtein – New • Bayes -‐ New • Shannon Entropy -‐ New • WordCount – New SPL • Fast Fourier -‐ New • (Perceptron) – coming soon • (Gradient Decent) – coming soon
28
![Page 29: Machine*Learning,* Entropy*and*Fraud*in* Splunk* · Machine*Learning*Goal* Program*computers*to*use*example*dataor*past experience*to*solve*agiven*problem*](https://reader033.fdocuments.in/reader033/viewer/2022042123/5e9f249b1b57706c427bde6e/html5/thumbnails/29.jpg)
29
ML Architecture – Data AcquisiEon
Menage
Proxy Thread
Add UUID
Forwarder
Indexes
Indexes
Indexes
Indexes
![Page 30: Machine*Learning,* Entropy*and*Fraud*in* Splunk* · Machine*Learning*Goal* Program*computers*to*use*example*dataor*past experience*to*solve*agiven*problem*](https://reader033.fdocuments.in/reader033/viewer/2022042123/5e9f249b1b57706c427bde6e/html5/thumbnails/30.jpg)
30
ML Architecture – Data EvaluaEon Menage
Proxy Thread
Add UUID
Indexes
Indexes
Indexes
Indexes
| anomalies field=file labelonly=true maxvalues=10 | bayes field=* | output entropy
Label::value
Adds a calculated field to data
User uses ML to evaluate data
Label::value added to event stream
![Page 31: Machine*Learning,* Entropy*and*Fraud*in* Splunk* · Machine*Learning*Goal* Program*computers*to*use*example*dataor*past experience*to*solve*agiven*problem*](https://reader033.fdocuments.in/reader033/viewer/2022042123/5e9f249b1b57706c427bde6e/html5/thumbnails/31.jpg)
Using Key Value Persistent Cache
31
• Populate Redis KV store based on ML search output.
• Label event with new Label::value mapped to
UUID • Pass Label::value è Index Eme to Menage
• Import Redis module to Splunk as a lookup for a value given a key (or use key store of choice)
Redis is an open source, advanced key-‐value store.
![Page 32: Machine*Learning,* Entropy*and*Fraud*in* Splunk* · Machine*Learning*Goal* Program*computers*to*use*example*dataor*past experience*to*solve*agiven*problem*](https://reader033.fdocuments.in/reader033/viewer/2022042123/5e9f249b1b57706c427bde6e/html5/thumbnails/32.jpg)
EvaluaEng Events with Reference Data
32
• generate a list of the top 5 whitelist domains to use the words as the key list for levenshtein calculaEon. We want a reference known good entropy list! • top_accepted_domains.csv • top_sites.txt
• Create a whitelist of users for all data (we may want to rate their risk at some point;) • proxy_users.csv
index=bluecoat cs_username=* cs_categories="whitelist*" | lookup • pull down a phishtank verified phishing mail list, we want a reference
blacklist lookup: • phishtank_verified.csv
![Page 33: Machine*Learning,* Entropy*and*Fraud*in* Splunk* · Machine*Learning*Goal* Program*computers*to*use*example*dataor*past experience*to*solve*agiven*problem*](https://reader033.fdocuments.in/reader033/viewer/2022042123/5e9f249b1b57706c427bde6e/html5/thumbnails/33.jpg)
ExtracEng an URL properly
33
Sample URL TLD Comments
hgp://www.brit.croydon.sch.uk croydon.sch.uk Third level TLD allocated by the Local EducaEon
Authority
192.168.0.42 IPv4 address, no TLD
www.splunk.42 42 This is not an IP address, 42 is correct
www.example.paris paris GTLD extracted smoothly
![Page 34: Machine*Learning,* Entropy*and*Fraud*in* Splunk* · Machine*Learning*Goal* Program*computers*to*use*example*dataor*past experience*to*solve*agiven*problem*](https://reader033.fdocuments.in/reader033/viewer/2022042123/5e9f249b1b57706c427bde6e/html5/thumbnails/34.jpg)
34
hJp://www.splunk.com/view/enterprise-‐security-‐app/SP-‐CAAAE8Z#tab_2
FAUP
domain_without_tld: splunk tld: com
lua input modules
lua output modules
Web Server
Faup Library
How many TLDs are “com”?
How many domains are “splunk”?
f4E
Splunk State Store
Using Evaluated Data for ML Features
![Page 35: Machine*Learning,* Entropy*and*Fraud*in* Splunk* · Machine*Learning*Goal* Program*computers*to*use*example*dataor*past experience*to*solve*agiven*problem*](https://reader033.fdocuments.in/reader033/viewer/2022042123/5e9f249b1b57706c427bde6e/html5/thumbnails/35.jpg)
MLSET/MLGET
35
Each event has a UUID, which is expected by the ML search commands MLSET, MLGET
• This calculated and populates field values which we’ll use as ML features to graph, or represent the data
• These calculaEons, creates the labels that disEnguish ‘anomalies’ or ‘outliers’ in the grouping of data we are evaluaEng.
Search-‐Ume operaUon on Splunk data to put into K/V stash: index=bluecoat cs_host=* | lookup webfaup url as cs_host | lookup wordstats word as url_domain | rename url_domain as domain ws_entropy as entropy | mlset algo="listlevenshtein” fields="domain,entropy” Pulling the Machine Learning results back at search: index=bluecoat cs_host=*| mlget algo="listlevenshtein”| table in.domain,in.entropy,levenscores.* Then we invesUgate results, and graph!
![Page 36: Machine*Learning,* Entropy*and*Fraud*in* Splunk* · Machine*Learning*Goal* Program*computers*to*use*example*dataor*past experience*to*solve*agiven*problem*](https://reader033.fdocuments.in/reader033/viewer/2022042123/5e9f249b1b57706c427bde6e/html5/thumbnails/36.jpg)
Results
36
• Wrote 4 Algorithms for evaluaEng URLs for these use cases: Malware, ExfiltraEon, Insider Threat detecEon, phishing agacks
• Created a method to build ML into Splunk using a KV store
• IdenEfied fraud and SQLi in proxy logs
• Make as few index-‐Eme decisions as possible to stay as close to real-‐Eme as possible.
![Page 37: Machine*Learning,* Entropy*and*Fraud*in* Splunk* · Machine*Learning*Goal* Program*computers*to*use*example*dataor*past experience*to*solve*agiven*problem*](https://reader033.fdocuments.in/reader033/viewer/2022042123/5e9f249b1b57706c427bde6e/html5/thumbnails/37.jpg)
37
![Page 38: Machine*Learning,* Entropy*and*Fraud*in* Splunk* · Machine*Learning*Goal* Program*computers*to*use*example*dataor*past experience*to*solve*agiven*problem*](https://reader033.fdocuments.in/reader033/viewer/2022042123/5e9f249b1b57706c427bde6e/html5/thumbnails/38.jpg)
38
![Page 39: Machine*Learning,* Entropy*and*Fraud*in* Splunk* · Machine*Learning*Goal* Program*computers*to*use*example*dataor*past experience*to*solve*agiven*problem*](https://reader033.fdocuments.in/reader033/viewer/2022042123/5e9f249b1b57706c427bde6e/html5/thumbnails/39.jpg)
39
![Page 40: Machine*Learning,* Entropy*and*Fraud*in* Splunk* · Machine*Learning*Goal* Program*computers*to*use*example*dataor*past experience*to*solve*agiven*problem*](https://reader033.fdocuments.in/reader033/viewer/2022042123/5e9f249b1b57706c427bde6e/html5/thumbnails/40.jpg)
40
![Page 41: Machine*Learning,* Entropy*and*Fraud*in* Splunk* · Machine*Learning*Goal* Program*computers*to*use*example*dataor*past experience*to*solve*agiven*problem*](https://reader033.fdocuments.in/reader033/viewer/2022042123/5e9f249b1b57706c427bde6e/html5/thumbnails/41.jpg)
Get URL Parser app
hgp://apps.splunk.com/app/1545/
![Page 42: Machine*Learning,* Entropy*and*Fraud*in* Splunk* · Machine*Learning*Goal* Program*computers*to*use*example*dataor*past experience*to*solve*agiven*problem*](https://reader033.fdocuments.in/reader033/viewer/2022042123/5e9f249b1b57706c427bde6e/html5/thumbnails/42.jpg)
Another approach to the same data…
![Page 43: Machine*Learning,* Entropy*and*Fraud*in* Splunk* · Machine*Learning*Goal* Program*computers*to*use*example*dataor*past experience*to*solve*agiven*problem*](https://reader033.fdocuments.in/reader033/viewer/2022042123/5e9f249b1b57706c427bde6e/html5/thumbnails/43.jpg)
43
For Security + Data Science N00bs
ML for Proxy logs
![Page 44: Machine*Learning,* Entropy*and*Fraud*in* Splunk* · Machine*Learning*Goal* Program*computers*to*use*example*dataor*past experience*to*solve*agiven*problem*](https://reader033.fdocuments.in/reader033/viewer/2022042123/5e9f249b1b57706c427bde6e/html5/thumbnails/44.jpg)
The Approach • The approach of applying Machine Learning Framework evaluaEng proxy data in order to classify the data at index Eme, based on specific features of the data.
• Performs intelligent analysis on incoming data and classifies it • Focus on idenEfying SQL injecEon • Because of the incremental training approach (StochasEc Gradient Descent), it gets more accurate with more dataapplied
44
![Page 45: Machine*Learning,* Entropy*and*Fraud*in* Splunk* · Machine*Learning*Goal* Program*computers*to*use*example*dataor*past experience*to*solve*agiven*problem*](https://reader033.fdocuments.in/reader033/viewer/2022042123/5e9f249b1b57706c427bde6e/html5/thumbnails/45.jpg)
What It Does
45
! Allows monitoring of calculated agributes
! Allows training on specific data fields for accuracy and feature isolaEon
! Seamlessly distributes trained models to all instances of Menage
![Page 46: Machine*Learning,* Entropy*and*Fraud*in* Splunk* · Machine*Learning*Goal* Program*computers*to*use*example*dataor*past experience*to*solve*agiven*problem*](https://reader033.fdocuments.in/reader033/viewer/2022042123/5e9f249b1b57706c427bde6e/html5/thumbnails/46.jpg)
Why It Magers • ML for Proxy allows for mulEple levels of automaEc analysis • Machine learning models installed by default adapt to your data and get beJer over Ume (StochasUc Gradient Descent)
• Incoming data is enriched via trained models and Menage before index Eme
• ModelPipeline Framework allows you to create custom models to fit your needs
46
![Page 47: Machine*Learning,* Entropy*and*Fraud*in* Splunk* · Machine*Learning*Goal* Program*computers*to*use*example*dataor*past experience*to*solve*agiven*problem*](https://reader033.fdocuments.in/reader033/viewer/2022042123/5e9f249b1b57706c427bde6e/html5/thumbnails/47.jpg)
How To Use It • Step 1: Follow instrucEons to configure Menage in Menage SpecificaEon document.
• Step 2: Configure regular expressions in props.conf if needed. • Step 3: Train models from “Train Models” dashboard.
– bow(php) where php is the PHP arguments field of the url gives good results for SQL injecEon
– Index your reference data, and evaluate change over Eme
• Step 4: Forward new data through Menage to have data classificaEon appended.
• Step 5: Analyze enriched data and periodically re-‐train models.
47
![Page 48: Machine*Learning,* Entropy*and*Fraud*in* Splunk* · Machine*Learning*Goal* Program*computers*to*use*example*dataor*past experience*to*solve*agiven*problem*](https://reader033.fdocuments.in/reader033/viewer/2022042123/5e9f249b1b57706c427bde6e/html5/thumbnails/48.jpg)
Step 1 • Menage must be configured on any indexer you want data enrichment and classificaEon on.
• Necessary conf files can either be pushed out in a distributed in scenario or modified manually.
• Menage is actually started by execuEng handler_server.py and menage.go.
• AuthenEcaEon is stored in a configuraEon file in that directory, more info can be found in the Menage Python Handler document.
48
![Page 49: Machine*Learning,* Entropy*and*Fraud*in* Splunk* · Machine*Learning*Goal* Program*computers*to*use*example*dataor*past experience*to*solve*agiven*problem*](https://reader033.fdocuments.in/reader033/viewer/2022042123/5e9f249b1b57706c427bde6e/html5/thumbnails/49.jpg)
Step 2 • Current regular expressions are designed for SGOS proxy data. • Regular expressions and parameter names can be changed as needed, you just need to remember to put in the new parameter name(s) in the train command as well.
• Contents of the MLFramework folder can also be extracted into the bin directory of any app for machine learning capabiliEes.
49
![Page 50: Machine*Learning,* Entropy*and*Fraud*in* Splunk* · Machine*Learning*Goal* Program*computers*to*use*example*dataor*past experience*to*solve*agiven*problem*](https://reader033.fdocuments.in/reader033/viewer/2022042123/5e9f249b1b57706c427bde6e/html5/thumbnails/50.jpg)
Step 3 • Training the models is probably the most important step! • Be careful the of the parameters you choose to train on, too many features will decrease accuracy as well as too few.
• Be sure to only train on features relevant to what you’re looking for – E.g. PHP arguments if you’re looking for SQL injecEon
• The extra parameter funcEons are really useful for specific tasks: – E.g. bag of words approach applied to PHP arguments can be really useful for
SQL injecEon detecEon
50
![Page 51: Machine*Learning,* Entropy*and*Fraud*in* Splunk* · Machine*Learning*Goal* Program*computers*to*use*example*dataor*past experience*to*solve*agiven*problem*](https://reader033.fdocuments.in/reader033/viewer/2022042123/5e9f249b1b57706c427bde6e/html5/thumbnails/51.jpg)
Step 4 • Forwarders must be configured to send all data to a port Menage is listening on to get classificaEon on new data.
• Ideally there should be an instance of Menage running on every indexer so all of your data is enriched.
• The ports Menage is listening on and sending to can be modified in the menage.ini file in the bin directory of Menage.
51
![Page 52: Machine*Learning,* Entropy*and*Fraud*in* Splunk* · Machine*Learning*Goal* Program*computers*to*use*example*dataor*past experience*to*solve*agiven*problem*](https://reader033.fdocuments.in/reader033/viewer/2022042123/5e9f249b1b57706c427bde6e/html5/thumbnails/52.jpg)
Step 5 • When Menage classifies incoming data, labels will be appended to the metadata of the event which can then be searched and evaluated based on. – The screenshot at the beginning of the slideshow shows the number of events
classified by Menage as having SQL content by semanEc analysis and by Snort signature detecEon.
• Most models support incremental training and should be trained frequently on new data coming in to improve accuracy – This also allows the models to adapt to your network
52
![Page 53: Machine*Learning,* Entropy*and*Fraud*in* Splunk* · Machine*Learning*Goal* Program*computers*to*use*example*dataor*past experience*to*solve*agiven*problem*](https://reader033.fdocuments.in/reader033/viewer/2022042123/5e9f249b1b57706c427bde6e/html5/thumbnails/53.jpg)
Constraints • Assuming independent features and algorithms, false posiEves will not go up when using a cascade,
• However • True posiEves will decrease. • Unless: • we keep the detecEon specialised and simple, and therefore be able to make P(A|M) = 1.0 or very close.
![Page 54: Machine*Learning,* Entropy*and*Fraud*in* Splunk* · Machine*Learning*Goal* Program*computers*to*use*example*dataor*past experience*to*solve*agiven*problem*](https://reader033.fdocuments.in/reader033/viewer/2022042123/5e9f249b1b57706c427bde6e/html5/thumbnails/54.jpg)
AssumpEons • Perfect detecEon is impossible. • Threat coverage is less than 100%. • Log feeds can fail someEmes. • Something that is malicious *might* cause an alarm. • The enEre set of malicious events includes those we can detect,
those we might detect, and some we don’t even know about. • Of those we don’t know about, given the right circumstances, we
have a chance of discovering through staEsEcal analysis. • Even when we should be able to detect an event, the above
constraints makes this less than certain.
![Page 55: Machine*Learning,* Entropy*and*Fraud*in* Splunk* · Machine*Learning*Goal* Program*computers*to*use*example*dataor*past experience*to*solve*agiven*problem*](https://reader033.fdocuments.in/reader033/viewer/2022042123/5e9f249b1b57706c427bde6e/html5/thumbnails/55.jpg)
What can we control? • The effecEveness of the IDS; • Coverage; • Noisy events; • CorrelaEon algorithms.
![Page 56: Machine*Learning,* Entropy*and*Fraud*in* Splunk* · Machine*Learning*Goal* Program*computers*to*use*example*dataor*past experience*to*solve*agiven*problem*](https://reader033.fdocuments.in/reader033/viewer/2022042123/5e9f249b1b57706c427bde6e/html5/thumbnails/56.jpg)
Lessons Learned
![Page 57: Machine*Learning,* Entropy*and*Fraud*in* Splunk* · Machine*Learning*Goal* Program*computers*to*use*example*dataor*past experience*to*solve*agiven*problem*](https://reader033.fdocuments.in/reader033/viewer/2022042123/5e9f249b1b57706c427bde6e/html5/thumbnails/57.jpg)
Quote Box
57
“A pessimist sees the difficulty in every opportunity; an opEmist sees the opportunity in every difficulty.”
-‐ Winston Churchill
![Page 58: Machine*Learning,* Entropy*and*Fraud*in* Splunk* · Machine*Learning*Goal* Program*computers*to*use*example*dataor*past experience*to*solve*agiven*problem*](https://reader033.fdocuments.in/reader033/viewer/2022042123/5e9f249b1b57706c427bde6e/html5/thumbnails/58.jpg)
THANK YOU