Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark based Lambda Architecture
-
Upload
hadoop-summit -
Category
Technology
-
view
254 -
download
1
Transcript of Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark based Lambda Architecture
© Verizon 2016 All Rights ReservedInformation contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners.
1
Near real-time network anomaly detection and traffic analysisPankaj RastogiTech Manager
Debasish DasData Scientist
© Verizon 2016 All Rights ReservedInformation contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners.
2
Agenda
• Network data overview
• DDoS as network anomaly
• Design challenges
• Trapezium overview
• Results
• Q&A
© Verizon 2016 All Rights ReservedInformation contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners.
3
Network: Aggregated data overview
• Network Management Protocol (SNMP) Network management console Network devices (routers, bridges, intelligent hubs)
• Data collection: Aggregated per router interface
• Inbound and outbound traffic statistics sampled at regular interval- Bits per second (bps)- Packets per second (pps)- CPU- Memory
SNMP Manager
Routers
SNMP ProtocolSNMP Statistics
© Verizon 2016 All Rights ReservedInformation contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners.
4
Network: Flow data overviewWeb browser
192.168.1.10
Web server
10.1.2.3
Request flow #1
TCP connection
Response flow #2
• Flow #1- Source address 192.168.1.10- Destination address 10.1.2.3- Source port 1025- Destination port 80- Protocol TCP
• Flow #2- Source address 10.1.2.3- Destination address 192.168.1.10- Source port 1025- Destination port 80- Protocol TCP
• A single flow may consist of several packets and many bytes
• TCP connections consists of two flows- Each flow will mirror the other- Can use TCP flags to determine the
client and the server
• ICMP, UDP and other IP protocol streams may contain one or two flows
© Verizon 2016 All Rights ReservedInformation contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners.
5
DDoS as network anomaly
Remote command & control
Attacker
Bots
Router
Customer
Attacker + Bots + Customer locations
Attacker + Bots + Customer IPsNetflow SNMP
Customer + Volumetric attack magnitude
© Verizon 2016 All Rights ReservedInformation contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners.
6
SNMP
Anomaly detection on time series
Nonparametric models for SNMP DDOS detection
© Verizon 2016 All Rights ReservedInformation contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners.
7
SNMP
Network Analysis on SNMP• Usage of each router/interface• Find routers that have high packets flow
© Verizon 2016 All Rights ReservedInformation contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners.
8
Anomaly detection on high frequency data
Parametric models for NetFlow DDOS detection
• Generate customer IP focused features based on DDOS definition
NetFlow
0:009/14/15 0:019/14/15 0:029/14/15 0:039/14/15 0:049/14/15 0:059/14/15 0:069/14/15 0:079/14/15 0:089/14/15 0:099/14/15 0:109/14/15 0:119/14/15 0:129/14/15 0:139/14/15 0:149/14/15 0:159/14/15 0:169/14/15 0:179/14/15 0:189/14/15 0:199/14/15 0:209/14/15 0:219/14/15 0:229/14/15 0:239/14/15 0:249/14/15 0:259/14/15 0:269/14/15 0:279/14/15 0:289/14/15 0:299/14/15 0:309/14/15 0:319/14/15 0:329/14/15 0:339/14/15 0:349/14/15 0:359/14/15 0:369/14/15 0:379/14/15 0:389/14/15 0:390:409/14/15 0:419/14/15 0:429/14/15 0:439/14/15 0:449/14/15 0:459/14/15 0:469/14/15 0:479/14/15 0:489/14/15 0:499/14/15 0:509/14/15 0:519/14/15 0:529/14/15 0:539/14/15 0:549/14/15 0:559/14/15 0:569/14/15 0:579/14/15 0:589/14/15 0:599/14/15 1:009/14/15 1:019/14/15 1:029/14/15 1:039/14/15 1:049/14/15 1:059/14/15 1:069/14/15 1:079/14/15 1:089/14/15 1:099/14/15 1:109/14/15 1:119/14/15 1:129/14/15 1:139/14/15 1:149/14/15 1:159/14/15 1:169/14/15 1:179/14/15 1:189/14/15 1:191:209/14/15 1:219/14/15 1:229/14/15 1:239/14/15 1:249/14/15 1:259/14/15 1:269/14/15 1:279/14/15 1:289/14/15 1:299/14/15 1:309/14/15 1:319/14/15 1:329/14/15 1:339/14/15 1:349/14/15 1:359/14/15 1:369/14/15 1:379/14/15 1:389/14/15 1:399/14/15 1:409/14/15 1:419/14/15 1:429/14/15 1:439/14/15 1:449/14/15 1:459/14/15 1:469/14/15 1:479/14/15 1:489/14/15 1:499/14/15 1:509/14/15 1:519/14/15 1:529/14/15 1:539/14/15 1:549/14/15 1:559/14/15 1:569/14/15 1:579/14/15 1:589/14/15 1:592:009/14/15 2:019/14/15 2:029/14/15 2:039/14/15 2:049/14/15 2:059/14/15 2:069/14/15 2:079/14/15 2:089/14/15 2:099/14/15 2:109/14/15 2:119/14/15 2:129/14/15 2:139/14/15 2:149/14/15 2:159/14/15 2:169/14/15 2:179/14/15 2:189/14/15 2:199/14/15 2:209/14/15 2:219/14/15 2:229/14/15 2:239/14/15 2:249/14/15 2:259/14/15 2:269/14/15 2:279/14/15 2:289/14/15 2:299/14/15 2:309/14/15 2:319/14/15 2:329/14/15 2:339/14/15 2:349/14/15 2:359/14/15 2:369/14/15 2:379/14/15 2:389/14/15 2:392:409/14/15 2:419/14/15 2:429/14/15 2:439/14/15 2:449/14/15 2:459/14/15 2:469/14/15 2:479/14/15 2:489/14/15 2:499/14/15 2:509/14/15 2:519/14/15 2:529/14/15 2:539/14/15 2:549/14/15 2:559/14/15 2:569/14/15 2:579/14/15 2:589/14/15 2:599/14/15 3:009/14/15 3:019/14/15 3:029/14/15 3:039/14/15 3:049/14/15 3:059/14/15 3:069/14/15 3:079/14/15 3:089/14/15 3:099/14/15 3:109/14/15 3:119/14/15 3:129/14/15 3:139/14/15 3:149/14/15 3:159/14/15 3:169/14/15 3:179/14/15 3:189/14/15 3:193:209/14/15 3:219/14/15 3:229/14/15 3:239/14/15 3:249/14/15 3:259/14/15 3:269/14/15 3:279/14/15 3:289/14/15 3:299/14/15 3:309/14/15 3:319/14/15 3:329/14/15 3:339/14/15 3:349/14/15 3:359/14/15 3:369/14/15 3:379/14/15 3:389/14/15 3:399/14/15 3:409/14/15 3:419/14/15 3:429/14/15 3:439/14/15 3:449/14/15 3:459/14/15 3:469/14/15 3:479/14/15 3:489/14/15 3:499/14/15 3:509/14/15 3:519/14/15 3:529/14/15 3:539/14/15 3:549/14/15 3:559/14/15 3:569/14/15 3:579/14/15 3:589/14/15 3:590
75,000
150,000
225,000
300,000flow
time
© Verizon 2016 All Rights ReservedInformation contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners.
9
NetFlow
Network Analysis on NetFlow• Find customer with maximum upload bytes• Find customer with maximum download bytes• Find peak usage for given customer
© Verizon 2016 All Rights ReservedInformation contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners.
10
Why we chose Apache Spark
• Good support for machine learning algorithms
• Spark’s micro-batching capabilities > Sufficient for our streaming requirements
• Vibrant Spark community
• Excellent talent availability within our group
© Verizon 2016 All Rights ReservedInformation contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners.
11
Lessons learned -- Spark
• Coalesce partitions when writing to HDFS
• Harmless action like take(1) can result in huge costs
• Multiple actions on a DataFrame/DStreams result in multiple jobs
• Spark DStream checkpointing with RDD models
• spark.sql.parquet.compression.codec – snappy
• spark.sql.shuffle.partitions – 2000+ when partition block size crosses 2 GB
© Verizon 2016 All Rights ReservedInformation contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners.
12
Design challenges
NFS/GFS
Data source?
Algorithms?
Persistence?
© Verizon 2016 All Rights ReservedInformation contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners.
13
Design challenges -- SNMPNear Real time model updates needed Lambda architecture• Batch job MUST process data at fixed interval
(e.g., 15 min)• Stream job MUST
> Handle hot starts (e.g., 90 days of data)
> Analyze data and generate anomalies> Updates model every sampling interval> Start from the last model timestamp on restart
Coordination between Batch and Stream processes NEEDED• Batch job updates ZooKeeper node at fixed
interval (e.g., 15 min)• Stream job uses the same ZooKeeper node to
load features
© Verizon 2016 All Rights ReservedInformation contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners.
14
Design challenges -- NetFlowSeed the model with good parameter estimates
• Batch job populates the initial model parameter• Stream job hot-starts with model and detect
anomalies• Stream job updates the model and persist it to
Cassandra
Model maintained in Cassandra• Stream job read the model to Spark partitions
from Cassandra• Spark partition updates the model• Spark partition generates anomalies• Models across partition are combined using Spark• Anomalies are persisted to Cassandra
Network analysis• Find peak usage for a given customer• Find customer with highest network usage• Find number of distinct source IPs connected to a
destination IP
© Verizon 2016 All Rights ReservedInformation contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners.
15
Network anomaly flow design
© Verizon 2016 All Rights ReservedInformation contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners.
16
Design challenges – multiple applications
© Verizon 2016 All Rights ReservedInformation contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners.
17
Trapezium
© Verizon 2016 All Rights ReservedInformation contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners.
18
What is Trapezium?
© Verizon 2016 All Rights ReservedInformation contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners.
19
What is Trapezium?• Ability to read data
> From multiple data sources, e.g., HDFS, NFS, Kafka> In Batch and Streaming modes to support lambda architecture
• Ability to write data > To multiple data sources, e.g., HDFS, NFS, Kafka
• Plug and Play architecture> Evaluate multiple algorithms> Evaluate different features of same algorithm
• Break down complex analytics problem in Transactions
• Build a workflow pipeline combining different Transactions
• Validation and filtering of input data
• Embedded Zookeeper, Kafka, C*, Hbase, etc available for unit tests
• Enable real time query processing capability> Akka HTTP server provides Spark as a Service
© Verizon 2016 All Rights ReservedInformation contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners.
20
Trapezium architecture
TrapeziumD1
D2
D3
O1
O2
O3
Validation
D1
V1
V1
O1
D2
O2
D3
O1
VARIOUS TRANSACTIONS
© Verizon 2016 All Rights ReservedInformation contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners.
21
WorkflowhdfsFileBatch = { batchTime = 5 batchInfo = [{ name = "hdfs_source" dataDirectory = {prod = "/prod/data/files"} }]}transactions = [{ transactionName="com.verizon.bda.DataAggregator" inputData=[{ name="hdfs_source" }] persistDataName="aggregatedOutput"},{ transactionName="com.verizon.bda.DataAligner" inputData=[{ name="aggregatedOutput" }] persistDataName="alignedOutput"},{ transactionName="com.verizon.bda.AnomalyFinder" inputData=[{ name="aggregatedOutput” }, { name="alignedOutput” }] persistDataName=”anomalyOutput"}]
• Workflow is a collection of transactions in batch or streaming mode
• Each transaction can take multiple data sources as input
• Output of one transaction can be input to another transaction
• Output of each transaction could be persisted or kept only in memory
• Single place to handle exceptions and raise failure events
© Verizon 2016 All Rights ReservedInformation contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners.
22
Transaction Traits
© Verizon 2016 All Rights ReservedInformation contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners.
23
Transaction Traits
© Verizon 2016 All Rights ReservedInformation contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners.
24
Support data sources• Trapezium can read data from HDFS, Kafka,
NFS, GFS
• Config entry for reading data from HDFS/NFS/GFS
dataSource="HDFS"dataDirectory = {
local="/local/data/files" dev= "/dev/data/files"
prod= "/prod/data/files" }
• Config entry for defining protocolfileSystemPrefix="hdfs://"fileSystemPrefix="file://"fileSystemPrefix="s3://"
• Trapezium can read data in various formats including text, gzip, json, avro and parquet
• Config entry for reading from Kafka topics
kafkaTopicInfo = { consumerGroup =
"KafkaStreamGroup" maxRatePerPartition = 970 batchTime = "5" streamsInfo = [{
name = "queries"
topicName = "deviceanalyzer"
}]}
• Config entry for reading fileFormatfileFormat="avro"fileFormat="json"fileFormat="parquet”
© Verizon 2016 All Rights ReservedInformation contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners.
25
Run modes• Trapezium supports reading data in batch as well streaming mode
• Config entry for reading in batch moderunMode="STREAM"batchTime=5
• Config entry for reading in stream moderunMode="BATCH"batchTime=5
• Read data by timestampoffset=2
• Process historical data in sequence of smaller data setsfileSplit=true
• Process same data multiple timesoneTime=true
© Verizon 2016 All Rights ReservedInformation contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners.
26
Data validation• Validates data at the source
• Filters out all invalid rows
• Validates schema of the input data
• Config entry for data validation
validation = { columns = ["name", "age", "birthday", "location"] datatypes = ["String", "Int", "Timestamp", "String"] dateFormat = "yyyy-MM-dd HH:mm:ss" delimiter = "|" minimumColumn = 4 rules = { name=[maxLength(30),minLength(1)] age=[maxValue(100),minValue(1)] }}
© Verizon 2016 All Rights ReservedInformation contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners.
27
Plug and play capability• Any transaction can be
added/removed by modifying workflow config file
• Output from multiple algorithms can be compared in real time
• Multiple features can be evaluated in different transactions
• Data sources can be switched with config change
• Model training can be done on different time windows to achieve best results
© Verizon 2016 All Rights ReservedInformation contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners.
28
Trapezium – github url
https://github.com/Verizon/trapezium
Version: 1.0.0-SNAPSHOTRelease: 14-Oct-2016
© Verizon 2016 All Rights ReservedInformation contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners.
29
Results
© Verizon 2016 All Rights ReservedInformation contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners.
30
SNMPSpark runtime with Hive/C* read/write
Data volume: 10 routers, 2.2 MB per 5 min, 650 MB per day
Compute: 10 executors, 4 cores
Memory: 16 GB per executor, 4 GB driver
With sampling rate of 2 min:• 2 nodes with 20 cores each
for 10 routers
• 200 nodes for 1000 routers
With sampling rate of 4 min:• 2 nodes can process 20
routers
• 100 nodes for 1000 routers
© Verizon 2016 All Rights ReservedInformation contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners.
31
SNMPSpark shuffle – read/write
Data volume: 10 routers, 2.2 MB per 5 min, 650 MB per day
Compute: 10 executors, 4 coresMemory: 16 GB per executor, 4 GB driver
© Verizon 2016 All Rights ReservedInformation contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners.
32
Data volume: 2 router, 50 MB per min, 70 GB per day
Compute: 10 executors, 4 cores
Memory: 16 GB per executor, 4 GB driver
NetFlowSpark + C* read/write runtime
• Due to parametric model, run time is better than SNMP
• NetFlow data is X times more than SNMP data
2 4 8 16 320
25
50
75
100
16 18
32
47
94.8
Router
Run
time
(s)
© Verizon 2016 All Rights ReservedInformation contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners.
33
NetFlowSpark + C* shuffle write
Shuffle (MB) 2 4 8 16 32
Spark 71.2 150.5 275.7 612.1 1261.4
Cassandra 30.2 64.4 115.6 263.7 545.1
2 4 8 16 320
350
700
1050
1400
Spark Cassandra
Router
Shu
ffle
(MB
)
© Verizon 2016 All Rights ReservedInformation contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners.
34
Summary• Reuse code across multiple applications
• Improve developer efficiency
• Encourage standard coding practices
• Provide unit-test framework for better code coverage
• Decouple ETL, analytics and algorithms in different Transactions
• Distribute query processing using Spark as a service
• Easy integration provided by configuration driven architecture
© Verizon 2016 All Rights ReservedInformation contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners.
35
Thank you