Big Data Visualization
-
Upload
raffael-marty -
Category
Data & Analytics
-
view
6.115 -
download
1
Transcript of Big Data Visualization
Raffael Marty, CEO
Big Data Visualization
London February, 2015
Secur i ty. Analyt ics . Ins ight .2
• Visualization
• Design Principles
• Dashboards
• SOC Dashboard
• Data Discovery and Exploration
• Data Requirements for Visualization
• Big Data Lake
Overview
Secur i ty. Analyt ics . Ins ight .3
I am Raffy - I do Viz!
IBM Research
4
Visualization
Secur i ty. Analyt ics . Ins ight .5
Why Visualization?the stats ...
http://en.wikipedia.org/wiki/Anscombe%27s_quartet
the data...
Secur i ty. Analyt ics . Ins ight .6
Why Visualization?
http://en.wikipedia.org/wiki/Anscombe%27s_quartet
Human analyst: • pattern detection • remembers context • fantastic intuition • can predict
Secur i ty. Analyt ics . Ins ight .7
Visualization To …
Present / Communicate Discover / Explore
Design Principles
Secur i ty. Analyt ics . Ins ight .9
Choosing Visualizations
Objective AudienceData
Secur i ty. Analyt ics . Ins ight .10
• Objective: Find attackers in the network moving laterally
• Defines data needed (netflow, sflow, …)
• maybe restrict to a network segment
• Audience: security analyst, risk team, …
• Informs how to visualize / present data
For Example - Lateral Movement
Recon Weaponize Deliver Exploit Install C2 Act
Secur i ty. Analyt ics . Ins ight .11
• Show comparisons, contrasts,
differences • Show causality, mechanism,
explanation, systematic structure. • Show multivariate data; that is,
show more than 1 or 2 variables.
by Edward Tufte
Principals of Analytic Design
Secur i ty. Analyt ics . Ins ight .12
Show Context
42
Secur i ty. Analyt ics . Ins ight .
42 is just a number
and means nothing without context
13
Show Context
Secur i ty. Analyt ics . Ins ight .15
Use Numbers To Highlight Most Important Parts of Data
NumbersSummaries
Secur i ty. Analyt ics . Ins ight .16
Additional information about objects, such as:
• machine • roles • criticality • location • owner • …
• user • roles • office location • …
Add Context
source destination
machine and user context
machine role
user role
Secur i ty. Analyt ics . Ins ight .17
Traffic Flow Analysis With Context
Secur i ty. Analyt ics . Ins ight .18
http://www.scifiinterfaces.com/
• Black background • Blue or green colors • Glow
Aesthetics Matter
Secur i ty. Analyt ics . Ins ight .19
B O R I N G
Secur i ty. Analyt ics . Ins ight .20
Sexier
Secur i ty. Analyt ics . Ins ight .21
• Audience, audience, audience!
• Comprehensive Information (enough context)
• Highlight important data
• Use graphics when appropriate
• Good choice of graphics and design
• Aesthetically pleasing
• Enough information to decide if action is necessary
• No scrolling
• Real-time vs. batch? (Refresh-rates)
• Clear organization
Dashboard Design Principles
22
SOC Dashboards
Secur i ty. Analyt ics . Ins ight .23
Mostly Blank
Secur i ty. Analyt ics . Ins ight .24
• Disappears too quickly
• Analysts focus is on their own screens
• SOC dashboard just distracts
• Detailed information not legible
• Put the detailed dashboards on the analysts screens!
Dashboards For Discovery
Secur i ty. Analyt ics . Ins ight .25
• Provide analyst with context
• “What else is going on in the environment right now?”
• Bring Into Focus
• Turn something benign into something interesting
• Disprove
• Turn something interesting into something benign
Use SOC Dashboard For Context
Environment informs detection policies
Secur i ty. Analyt ics . Ins ight .26
Show Comparisons
Current Measure
week prior
Secur i ty. Analyt ics . Ins ight .27
• News feed summary (FS ISAC feeds, mailinglists, threat feeds)
• Monitoring twitter or IRC for certain activity / keywords
• Volumes or metrics (e.g., #firewall blocks, #IDS alerts, #failed transactions)
• Top N metrics:
• Top 10 suspicious users
• Top 10 servers connecting outbound
What To Put on Screens
Provide context to individual security alerts
http://raffy.ch/blog/2015/01/15/dashboards-in-the-security-opartions-center-soc/
28
Data Discovery & Exploration
Secur i ty. Analyt ics . Ins ight .29
Visualize Me Lots (>1TB) of Data
Secur i ty. Analyt ics . Ins ight .30
Information Visualization Mantra
Overview Zoom / Filter Details on Demand
Principle by Ben Shneiderman
• summary / aggregation • data mining • signal detection (IDS, behavioral, etc.)
Secur i ty. Analyt ics . Ins ight .31
• Access to data
• Parsed data and data context
• Data architecture for central data access and fast queries
• Application of data mining (how?, what?, scalable, …)
• Visualization tools that support
• Complex visual types (||-coordinates, treemaps,
heat maps, link graphs)
• Linked views
• Data mining (clustering, …)
• Collaboration, information sharing
• Visual analytics workflow
Visualization Challenges
Big Data Lake
Secur i ty. Analyt ics . Ins ight .33
• One central location to store all cyber security data • “Data collected only once and third party software leveraging it” • Scalability and interoperability
• More than deploying an off the shelf product from a vendor • Data use influences both data formats and technologies to store the data
• search, analytics, relationships, and distributed processing • correlation, and statistical summarization
• What to do with Context? Enrich or join? • Hard problems:
• Parsing: can you re-parse? Common naming scheme! • Data store capabilities (search, analytics, distributed processing, etc.) • Access to data: SQL (even in Hadoop context), how can products access the data?
The Big Data Lake
Secur i ty. Analyt ics . Ins ight .34
Federated Data Access
SIEM
dispatcher
SIEM connector SIEM console
Prod A
AD / LDAPHR
…
IDS
FW Prod B
DBs
Data Lake
Caveats:
• Dispatcher?
• Standard access to dispatcher /
products enabled
• Data lake technology?
SNMP
Secur i ty. Analyt ics . Ins ight .35
Multiple Data Stores
raw logs
key-value
structured
real-timeprocessing
(un)-structured data
context
SQL
storage
stats
index
queue
distributedprocessing
access
graph
Caveat:
• Need multiple types of data stores
Secur i ty. Analyt ics . Ins ight .36
Technologies (Example)
raw logs
key-value(Cassandra)
columnar(parquet)
real-time processing
(Spark)
(un)-structured data
context
SQL(Impala,
SparkSQL)
HDFS
aggregates
index(ES)
queue(Kafka)
distributedprocessing
(Spark)
access
graph(GraphX)
Caveat:
• No out of the box
solution available
Secur i ty. Analyt ics . Ins ight .37
SIEM Integration - Log Management First
SIEM
columnar or
search engineor
log management
processing
SIEM connector
raw logs
SIEM console
SQL or searchinterface
processingfiltering
HDFS
e.g., PIG parsing
Secur i ty. Analyt ics . Ins ight .38
Simple SIEM Integration
raw, csv, jsonflume
log data
SQL(Impala,
with SerDe)
HDFS
SIEM connector
SIEM
Requirement:
• SIEM connector to forward text-based data to Flume.
SQL interface Tableau, etc.
SIEM console
Secur i ty. Analyt ics . Ins ight .39
SIEM Integration - Advanced
SIEM
columnar(parquet)
processing
syslog data
SQL(Impala,
SparkSQL)
HDFS
index(ES)
queue(Kafka)
access
other data sources
SIEM connector
raw logs
SIEM console
SQL and search interface
Tableau, Kibana, etc.requires parsing and formatting in a SIEM readable format (e.g., CEF)
Secur i ty. Analyt ics . Ins ight .40
What I am Working On
Data Stores Analytics Forensics Models Admin
10.9.79.109 --> 3.16.204.150 10.8.24.80 --> 192.168.148.19310.8.50.85 --> 192.168.148.19310.8.48.128 --> 192.168.148.19310.9.79.6 --> 192.168.148.193
10.9.79.6
10.8.48.128
80
538.8.8.8
127.0.0.1
Anomalies
Decomposition
Data
Seasonal
Trend
Anomaly Details
“Hunt” ExplainVisual Search
• Big data backend • Own visualization engine (Web-based) • Visualization workflows
Secur i ty. Analyt ics . Ins ight .41
BlackHat Workshop
Visual Analytics - Delivering Actionable Security
Intelligence
August 1-6 2015, Las Vegas, USA
big data | analytics | visualization
Secur i ty. Analyt ics . Ins ight .42
http://secviz.org
List: secviz.org/mailinglist
Twitter: @secviz
Share, discuss, challenge, and learn about security visualization.
Security Visualization Community
Secur i ty. Analyt ics . Ins ight .
http://slideshare.net/zrlram
http://secviz.org and @secviz
Further resources: