Big Data Technologies for InfoSec
description
Transcript of Big Data Technologies for InfoSec
Big Data Technologies for
InfoSecDive Deeper. See Further.
Ram Sripracha ([email protected])UCLA / Sift Security
Experiences
RR Systems
What are “Big Data” systems?
• XXL in Size• Data Volume• TBs - PBs
• Computation Scalability• Horizontally Scalable• Multi-host Deployment• Commodity Hardware
Why now?
• Rich Ecosystem• Well Supported Open Source Software
• High Adoption Rate• Commercial Backings• “Redhat” Model
• Heavily Invested
Platform Providers
Technologies
Is it a “Big Data” problem?
• Many moving parts• Initially maybe overwhelming
• 100s of configuration setting• Requests some level of expertise• Overkill for some problems• Larger resource footprint
Big Data Stack
Big Data Stack
DFS
• NoSQL• Columnar• Sits on HDFS• Million Rows
x Million Columns • Cell-level Security
Titan
• Graph-based Datastore• Optimized for (E, V)• Key/Value attributes for vertices
and edges
• 100s million vertices x 100s billion edges
• Capturing relationships• Sits on top of HBase, Cassandra,
…
Map-Reduce
• Resilient Distributed Dataset(RDD)
• In-Memory RDD• Iterative Algorithms• Machine Learning
Impala
• Near-real-time analysis• Micro-batch processing• Pipelining of micro-
batches• Stream annotations
• Sits on top of• Distributed indexing and search• Indexes • Raw text files from HDFS• HBase content• Titan properties• Other data replicated data streams
Application Log Search
• Full Text Indexes• Flexible Faceting• Automatic field extraction• Dashboard-able search
interface• Low-cost alternative to
Splunk and other search solutions
Real-time Blacklist Alerting• Fault tolerance• Netflow annotation• Match alerting• Application access alerting• Authentication alerting
• Network metrics
Netflow Data Warehouse
• 3x Nodes• 2x 8-Core Intel E5-2450 per
node• 16Gb RAM per node• 72TB Storage Total• ~5B Netflow records/day• >1 year retention• Support complex SQL-like query
Netflow Data Warehouse
• Continuous scanning• Direct querying of delimited
file• Perform metrics and diffs• Compute trending• Firewall rule validations• Long retention
DFS
EMR Access Anomalies• Category of insider threat• Relational networks of• Users/Groups• Department• Document Access
• Community structure-based anomaly detection