100G Network Monitoring Irvine, CA March 11th, 2015 …rich/Bro/CENIC-100GMonitoring.pdf · Python...
Transcript of 100G Network Monitoring Irvine, CA March 11th, 2015 …rich/Bro/CENIC-100GMonitoring.pdf · Python...
100G Network Monitoring with Bro and Time Machine
Vincent StofferCyber Security Engineer
CENIC ConferenceMarch 11th, 2015Irvine, CA
UNIVERSITY OF CALIFORNIA
CENIC 2015
Lawrence Berkeley National Laboratory● Located in Berkeley, CA● "Bringing science solutions to the world"● Unclassified DoE research facility
operated by University of California● Functions much like a research
university
Overview
CENIC 2015
● ~5000 users ~10,000 hosts● Distributed computing resources● Many guests and visitors● Open network to enable
collaboration and research
Computing overview
CENIC 2015
● Expensive hardware○ No “product” solution
● Overall traffic volume○ overwhelming sensors○ log volumes
● Elephant flows○ Scaling up and down
● Maintain same visibility and protections
100G monitoring challenges
CENIC 2015
● Optical taps○ 100G, 10G, 1G
● Collect at packet broker○ Previously expensive proprietary
hardware○ Merchant silicon changed the game
● Send out to monitoring devices
Overview
CENIC 2015
Apcon, 10G monitordevices installed @LBL2007
cPacket cVu, 10G monitordevices installed @LBL 2011
Arista,100G monitordevices installed @LBL 2015
CENIC 2015
● Mostly flat network● Simple tapping setup
○ External & Internal○ Dynamic “firewall” in the middle
● Apcon -> cPacket tapping infrastructure
10G @ LBL since 2007
CENIC 2015
100G Berkeley Lab approach
● Scale up our setup on 10G● Moving from duplication to
advanced aggregation● New device needed● Based on previous work from
Scott Campbell at NERSC
CENIC 2015
● 100G and 10G ports● Filtering at ingress & egress● Port speed agnostic● Aggregation, symmetric load-balancing● No oversubscription limits● API for dynamic filtering/shunting● Filtering for arbitrary IP headers and TCP flags● Every port can be input/output● Create port groups● Send output to load-balanced groups and single
ports● IPv6 support
100G Device requirements
CENIC 2015
● Commercial / Appliance● Commodity network (proprietary /
hybrid)● Commodity network + SDN
(scipass/flowscale)
100G Monitoring device options
CENIC 2015
● Flexible interface including GUI● High density - 6 port 100G line card
(supports LR-4) plus 144 10G ports! ● Easy to use API
○ dynamic shunting!● Relatively low cost● Lots of peers using
We chose Arista
10G Cluster (cPacket + Force10+12 Super Micro’s)LBL since 2007
Cluster-in-a-box (Arista + myricom + 1 super Micro )
CENIC 2015
General Architecture● Split 100Gb link into 5 (or more) streams
of 10G to feed each node● Further divide each 10G stream into
10x1Gb so each of the worker nodes sees 1/50th of the traffic
● When our sustained traffic is 20Gbps (high estimate), each worker sees about 400 Mbps of the traffic
● Scale up as necessary
CENIC 2015
● Sniffer10G○ Support for Linux, FreeBSD○ Myricom 10G cards only○ Supports only one tool in 2.0
(multiple tools in 3.0)○ Company/IP in some flux
Network cards - Myricon
CENIC 2015
Shunting● “Heavy Tail Effect*” is the observation
that a small number of network flows will dominate the overall volume of data transferred for a given time
● By detecting and removing the data component of these “heavy tail” flows, analysis load is dramatically reduced without sacrificing security
*Scott Campbell’s work
CENIC 2015
● Exclusions (IP pairs, netblocks, ports/protocols)○ Research networks / affiliates○ Resnet?
● Identify Elephant flows○ allow Control traffic
● Dynamic - Holy Grail○ Bro, API, near real time
Filters for Shunting
CENIC 2015
● Python program for shunting● Written by Justin Azoff● Uses Arista JSON API to add ACLs
which allow only control packets● Bro’s reaction framework feeds data
real-time● Connection details are preserved
Dumbno
CENIC 2015
Load Balancer Traffic split/node IDS UNIX OS
Arista (7504+7150) Myricom 10G-PCIE2-8C2, Myricom 10G sniffer drivers
Bro FreeBSD-10.1
Load Balancer Traffic split/node IDS UNIX OS
● Arista● Brocade● Endace● Gigamon● Open Flow● others ?
● PF_RING● Packet
Bricks + netmap
● Endace DAG
● Snort● Suricata
● Linux ● FreeBSD
This table provides alternative tools and technologies for various parts of a 100G monitoring system.
● Know thy network● Focus on people not products● Commodity hardware● UNIX/Linux focused● Free & open source software● Super adaptable
Open Source Network MonitoringPhilosophy
CENIC 2015
CENIC 2015
Not your typical IDS/IPS
● A monitoring platform○ A standalone network monitor○ A programmable framework○ An ecosystem
What is Bro? www.bro.org
CENIC 2015
● Commodity servers (Supermicro)● Linux/FreeBSD● Network cards (Intel, Myricom,
high end DAG)
Hardware
CENIC 2015
Bro platform
Intrusion Detection
Programming Language
Packet Processing
VulnMgmt
File Analysis
Log Recording
Custom Logic
Standard Library
Network Traffic
Apps
Bro Platform
Tap
CENIC 2015
Bro platform
Intrusion Detection
Programming Language
Packet Processing
VulnMgmt
File Analysis
Log Recording
Custom Logic
Standard Library
Network Traffic
Apps
Bro Platform
Tap
CENIC 2015
● Connection logs● Protocol logs● Custom logs● Alerting and debug logs● Log formats:
○ ASCII (plain text, default)○ Elasticsearch○ SQLite○ Dataseries (HP) binary output
Bro log types
CENIC 2015
>ls *.log
app_stats.log notice.logcommunication.log reporter.logconn.log smtp.logdhcp.log socks.logdns.log software.logdpd.log ssh.logfiles.log ssl.logftp.log stderr.loghttp.log stdout.logirc.log syslog.logknown_certs.log traceroute.logknown_hosts.log tunnel.logweird.log modbus.log
CENIC 2015
● Netflow ++● Stateful connection records● Includes “originator” and
“responder”● Total byte counts, connections
times, history and more
Bro connection logs (conn.log)
CENIC 2015
Mar 3 16:35:36 ClmuHr1gC6p76JbdVl128.3.x.x 45191 207.62.80.166 80 tcp
http 0.023945 351 9886 SF T 0ShADadfF 6 671 11 10466 (empty)worker-2-5
conn.log
CENIC 2015
Field Value Description
ts 1425429336.809148 UNIX timestamp
uid ClmuHr1gC6p76JbdVl Unique ID
id.orig_h 128.3.x.x Originator IP
id.orig_p 45191 Originator port
id.resp_h 207.62.80.166 Responder IP
id.resp_p 80 Responder port
proto tcp IP Protocol
service http Application protocol
duration 0.023945 Duration
orig_bytes 351 Bytes by originator
resp_bytes 9886 Bytes by responder
history ShADadfF State history
CENIC 2015
● Full protocol level details● Configurable● Unique ID consistent across all
logs● Contents based on protocol
Bro application logs
CENIC 2015
Mar 3 16:35:36 CHlGTa39L4ViNKf5wb128.3.x.x 32609 131.243.5.1 53 udp52600 cenic2015.cenic.org
1C_INTERNET 1 A 0 NOERROR F F T T 0207.62.80.166 7973.000000 F
dns.log
CENIC 2015
Mar 3 16:35:36 ClmuHr1gC6p76JbdVl128.3.x.x 45191 207.62.80.166 80 1GET cenic2015.cenic.org / -Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.115 Safari/537.36 09695 200 OK - - - (empty) - - -- -FrQ9Ct3IucTKymFao7 text/html HOST,CONNECTION,ACCEPT,USER-AGENT,DNT,ACCEPT-ENCODING,ACCEPT-LANGUAGE - - /
http.log
CENIC 2015
● Ground truth for your network (Know thy network)
● Troubleshooting● Analytics / reporting● DFIR● Use to build alerts and take
actions
Great, but what do I need all that for?
CENIC 2015
Bro platform
Intrusion Detection
Programming Language
Packet Processing
VulnMgmt
File Analysis
Log Recording
Custom Logic
Standard Library
Network Traffic
Apps
Bro Platform
Tap
CENIC 2015
● Bro is event based● Almost any event can trigger a
notice (notice.log)● Then you can take action● More typical IDS function
Notices / Alerts
CENIC 2015
Address_SeenScan::Address_ScanScan::Port_ScanSSH::Password_GuessingTraceroute::DetectedNTP::NTP_Monlist_QueriesSSL::Invalid_Server_CertSMTPurl::SMTP_Link_in_EMAIL_ClickedSMTPurl::SMTP_WatchedFileTypeSMTPurl::SMTP_Embeded_Malicious_URLHTTP::HTTP_SensitiveURIHTTP::SQL_Injection_AttackerSoftware::Vulnerable_VersionTeamCymruMalwareHashRegistry::Match
Some example notices
CENIC 2015
● Notify via email/SMS/etc.● Shell scripts● Firewall/device integration● ACLd● Total flexibility
Alert actions
CENIC 2015
Bro platform
Intrusion Detection
Programming Language
Packet Processing
VulnMgmt
File Analysis
Log Recording
Custom Logic
Standard Library
Network Traffic
Apps
Bro Platform
Tap
CENIC 2015
● Core - Generates events● Scripting - Does stuff with themNot a “signature” though of course there is a way to do that :)
Bro policy
CENIC 2015
● Don’t ask what Bro can do, better to ask what do you want to do?○ NTP monlist○ SIP scanners○ Tor ban○ SMTP URL○ SSH foreign login
Bro policy philosophy
CENIC 2015
● But Bro can do everything??!!● Bro provides us amazing
metadata and beyond, but we sometimes need more
● Enter Time Machine
Beyond Bro?
CENIC 2015
● Stefan Kornexl● Graduate thesis project● Technische Universität München
Stefan Kornexl, Vern Paxson, Holger Dreger, Anja Feldmann, and Robin Sommer. 2005. Building a time machine for efficient recording and retrieval of high-volume network traffic. In Proceedings of the 5th ACM SIGCOMM conference on Internet Measurement (IMC '05). USENIX Association, Berkeley, CA, USA
Time Machine background
CENIC 2015
● Creates pcap files with indexes● Killer feature: "connection cutoff"● Cutoffs defined per port● Assumption: interesting stuff in
the first N bits
Time Machine
CENIC 2015
class "smtp" {
filter "port 25 or port 587";
cutoff 25m;
filesize 2000m;
}
Time Machine configclass "encrypted" {
filter "port 22 or port 443";
cutoff 500k ;
filesize 2000m;
}
CENIC 2015
● Average 2-4 Gb/s● Spikes to 10-20 Gb/s● Roughly 25 TB / day full traffic● 750 TB / month!
Traffic numbers
CENIC 2015
● Our goal was 6 months of packet capture
● With full traffic, we could do <1 week
● After multiple iterations/tuning of our buckets
Storage
CENIC 2015
March 2015 config
bucketscapture
MB daily GB 6mo TB
http 5 500
smtp 25 50
encrypted 500k 200
udp 5 20
icmp 64k 1
53 tcp/udp 5 15
else 5 150
TOTAL 936 170 From 750TB/ month!
CENIC 2015
● Unless you are under regulatory requirements, doing full packet capture is probably wrong
● Once tuned, we want more horizontal but not more vertical (shallow TM)
● Incidents (SIP)
But it’s not full packet capture...
CENIC 2015
Buckets Number of conns
threshold
conns < threshold
conns > threshold
Capture coverage with Threshold (%)
Capture size
Actual traffic on the wire
udp 13,149,143 5M 13,142,093 7,050 99.94 20 G 400 G
http 21,586,940 5M 21,568,519 18421 99.91 480 G 6100 G
https 8,332,603 500K 8207340 125263 98.49 200 G 2300 G
icmp 5,168,723 64K 5,168,004 719 99.98 935 M 984 M
smtp 1,005,569 25M 1005400 169 99.98 60 G 66 G
dns 53,450,492 5M 53450434 58 99.99 17 G 9 G
ssh 4,445,375 500K 4443373 2002 99.95 2 G 2100 G
CENIC 2015
● Indexes may be helpful● TCPdump as the retrieval
interface (BPF)● Command line ‘find’ in your
buckets● Off to wireshark or whatever
Time machine - retrieval
CENIC 2015
● Bro connects to Time Machine● Bro can request data from TM to
pass to an analyst or to perform retroactive processing
Time machine - Bro
CENIC 2015
● IPv6 support (LBL branch)● Indexes don’t persist between
restarts (Fix coming?)● Searching and collating can be a
pain● No searching above layer 4
Time machine - shortcomings
CENIC 2015
● Download Bro: www.bro.org● Check out Security Onion: www.
securityonion.net● Time Machine: www.bro.
org/community/time-machine.html
● Berkeley Lab 100G technical doc
How to get started