Network Flow Analysis

39
Network Flow Analysis Network Flow Analysis Mark Mark Meiss Meiss Presentation for Presentation for NaN NaN -Group -Group October 4, 2004 October 4, 2004

description

 

Transcript of Network Flow Analysis

Page 1: Network Flow Analysis

Network Flow AnalysisNetwork Flow Analysis

Mark Mark MeissMeiss

Presentation for Presentation for NaNNaN-Group-Group

October 4, 2004October 4, 2004

Page 2: Network Flow Analysis

OverviewOverview

Data descriptionData description

–– The Internet2 (Abilene) data networkThe Internet2 (Abilene) data network

–– NetflowNetflow traffic data traffic data

Data collectionData collection

Data analysisData analysis

–– TechniquesTechniques

–– Preliminary resultsPreliminary results

Future workFuture work

Page 3: Network Flow Analysis

What is Abilene?What is Abilene?

Internet2 (Abilene) is a nationwide high-Internet2 (Abilene) is a nationwide high-speed data network for research andspeed data network for research andhigher education.higher education.–– Network backbone runs at 10 Network backbone runs at 10 GbpsGbps

–– Over 220 member institutionsOver 220 member institutions

–– Peers with over 40 other research networksPeers with over 40 other research networks

Abilene uses the same protocols asAbilene uses the same protocols asInternet1 but only carries academic traffic.Internet1 but only carries academic traffic.–– This is like the old This is like the old NSFnetNSFnet or or vBNSvBNS

Page 4: Network Flow Analysis
Page 5: Network Flow Analysis

Why is Abilene Interesting?Why is Abilene Interesting?

The Abilene network is a The Abilene network is a transittransit network.network.–– It includes both international and domestic traffic.It includes both international and domestic traffic.

–– It offers a good view of server networks.It offers a good view of server networks.

–– Commercial transmit networks do not share trafficCommercial transmit networks do not share trafficdata.data.

The Abilene network is The Abilene network is uncongesteduncongested..–– Statistics will not be biased by packet loss.Statistics will not be biased by packet loss.

The Abilene network contains The Abilene network contains students.students.–– Students are unconcerned about niceties of law.Students are unconcerned about niceties of law.

–– There is a lot of peer-to-peer and There is a lot of peer-to-peer and ““greygrey”” traffic. traffic.

Page 6: Network Flow Analysis

What is What is ““NetflowNetflow””??

In the early 1990In the early 1990’’s, Cisco introduced as, Cisco introduced a

new network router architecture.new network router architecture.

The The ““line cardsline cards”” in their new routers in their new routers

contained a hardware hash table forcontained a hardware hash table for

current network connections.current network connections.

Somebody got the bright idea of sendingSomebody got the bright idea of sending

entries from the table onto the networkentries from the table onto the network

before clearing them from the hash table.before clearing them from the hash table.

Page 7: Network Flow Analysis

What is a Network Flow?What is a Network Flow?

A A network flownetwork flow consists of one or more packets sent consists of one or more packets sentfrom a from a source (IP, port)source (IP, port) to a to a destination (IP, port)destination (IP, port) using usinga certain a certain transport protocoltransport protocol during some time interval. during some time interval.

Example:Example:

Source: 156.56.103.1, port 80 Source: 156.56.103.1, port 80

DestDest.: 149.159.250.21, port 6132.: 149.159.250.21, port 6132

Protocol: TCP Protocol: TCP

Packets: 20 Packets: 20

The above network flow would be typical for a WebThe above network flow would be typical for a Webconnection.connection.

Page 8: Network Flow Analysis

Wait a Minute!Wait a Minute!

DonDon’’t TCP connections involve two-wayt TCP connections involve two-waycommunication?communication?–– Yes, so every TCP connection is actually Yes, so every TCP connection is actually twotwo flows flows

from the point of view of from the point of view of NetflowNetflow..

UDP and ICMP are stateless, so how can they beUDP and ICMP are stateless, so how can they beaggregated into flows?aggregated into flows?–– We assume that packets with matching 5-tuplesWe assume that packets with matching 5-tuples

during some period of time are part of the same flow.during some period of time are part of the same flow.

IsnIsn’’t it hard for a router to keep up with this?t it hard for a router to keep up with this?–– Yes, so most modern routers Yes, so most modern routers samplesample the flow data at the flow data at

a ratio of about 100:1.a ratio of about 100:1.

Page 9: Network Flow Analysis

How is How is NetflowNetflow transmitted? transmitted?

Most modern routers support the Most modern routers support the ““NetflowNetflowv5v5”” format for representing flows. format for representing flows.–– This includes a variety of additionalThis includes a variety of additional

information about each flow.information about each flow.

The router uses UDP to send packetsThe router uses UDP to send packetscontaining between 1 and 30 flow recordscontaining between 1 and 30 flow recordsto a management workstation.to a management workstation.

–– (In this case, the management workstation is(In this case, the management workstation issitting on my desk.)sitting on my desk.)

Page 10: Network Flow Analysis

Netflow-v5 Header FormatNetflow-v5 Header Format

[padding]

# of flows in packetversion number

engine type engine ID

export time (ns)

router uptime (ms)

sequence number

export time (sec. since 1970-01-01 00:00:00 UTC)

Page 11: Network Flow Analysis

Netflow-v5 Flow Record FormatNetflow-v5 Flow Record Format

dest. masksource mask

ToSTCP flags

source AS destination AS

[padding]

protocol[padding]

source IP address

destination IP address

SNMP ifIndex (in) SNMP ifIndex (out)

IP address of next-hop router

total number of packets

total number of octets

router uptime at start of flow (ms)

router uptime at end of flow (ms)

source port destination port

Page 12: Network Flow Analysis

How Much Data is There?How Much Data is There?

The Abilene routers generate betweenThe Abilene routers generate between700,000,000 and 800,000,000 flows per700,000,000 and 800,000,000 flows perday.day.–– At 48 bytes per record, that amounts toAt 48 bytes per record, that amounts to

around 35 GB of data.around 35 GB of data.

–– Flows come in at a rate of about 3.4 Mbps.Flows come in at a rate of about 3.4 Mbps.

–– Data compresses at a ratio of about 2.8:1.Data compresses at a ratio of about 2.8:1.

Most existing tools canMost existing tools can’’t handle thist handle thisvolume of data.volume of data.

Page 13: Network Flow Analysis

WhatWhat’’s the Motivation?s the Motivation?

Okay, so IOkay, so I’’m storing egregious amounts ofm storing egregious amounts of

data and making my hard drive whimperdata and making my hard drive whimper……

what for?what for?

Page 14: Network Flow Analysis

Flow Data as a Behavioral NetworkFlow Data as a Behavioral Network

Think of a single flow as defining an Think of a single flow as defining an edgeedge from a from asource nodesource node to a to a destination nodedestination node..

The resulting network describes the Internet The resulting network describes the Internet asasitit’’s actually being used.s actually being used.–– Many possible biases are eliminated.Many possible biases are eliminated.

–– A lot of dynamic information is included.A lot of dynamic information is included.

Most structural analysis of the Internet hasMost structural analysis of the Internet has(necessarily) focused on its (necessarily) focused on its physical physical structure.structure.

Imagine a Google based on data about whereImagine a Google based on data about wherepeople actually go!people actually go!

Page 15: Network Flow Analysis

Behavioral Anomaly DetectionBehavioral Anomaly Detection

My main interest is in recognizing differentMy main interest is in recognizing differenttypes of behavior based on flow data.types of behavior based on flow data.–– Can I determine whether a port is running aCan I determine whether a port is running a

peer-to-peer application?peer-to-peer application?

–– Can I see the spread of a new worm acrossCan I see the spread of a new worm acrossthe network?the network?

–– Can I determine what kind of behavior is theCan I determine what kind of behavior is theprelude to an attack?prelude to an attack?

–– Can I find new peer-to-peer applicationsCan I find new peer-to-peer applicationsbefore the word is out?before the word is out?

Page 16: Network Flow Analysis

Preliminary ResultsPreliminary Results

I wish this section had more, but II wish this section had more, but I’’m reallym reallyjust getting off the groundjust getting off the ground……

The size of data has been a majorThe size of data has been a majorchallenge.challenge.

–– The network formed by a day of flow data hasThe network formed by a day of flow data hasabout 29.7 million nodes and 128 millionabout 29.7 million nodes and 128 millionedges.edges.

–– Just finding a way of converting a set ofJust finding a way of converting a set ofcaptured flows to a sparse matrixcaptured flows to a sparse matrixrepresentation has been difficult.representation has been difficult.

Page 17: Network Flow Analysis

Degree DistributionDegree Distribution

Page 18: Network Flow Analysis
Page 19: Network Flow Analysis
Page 20: Network Flow Analysis

Determining Clients and ServersDetermining Clients and Servers

Every network connection involves two hosts:Every network connection involves two hosts:–– The The clientclient is the system that is the system that initiatesinitiates the connection. the connection.

–– The The serverserver is the system that is the system that acceptsaccepts the connection. the connection.

Because of sampling, weBecause of sampling, we’’re as likely to see there as likely to see theclient-to-server side as the server-to-client side.client-to-server side as the server-to-client side.–– This makes the direction basically meaningless.This makes the direction basically meaningless.

We can We can guessguess which is which using the port which is which using the portinformation.information.–– The The more commonmore common port number indicates the port number indicates the server.server.

–– The The less commonless common port number indicates the port number indicates the client.client.

Page 21: Network Flow Analysis
Page 22: Network Flow Analysis
Page 23: Network Flow Analysis

Strength DistributionStrength Distribution

This is the distribution of the total numberThis is the distribution of the total number

of octets in and out of each node.of octets in and out of each node.

Special problem for client/server version ofSpecial problem for client/server version of

the networkthe network

–– If we direct all flows from server to client,If we direct all flows from server to client,

what do we do when we only have a volumewhat do we do when we only have a volume

for the opposite direction?for the opposite direction?

–– For now, I treat the network as beingFor now, I treat the network as being

undirectedundirected for studying strength. for studying strength.

Page 24: Network Flow Analysis
Page 25: Network Flow Analysis
Page 26: Network Flow Analysis

AS NumbersAS Numbers

An An ““autonomous systemautonomous system”” is the basic is the basic

building block of the Internet.building block of the Internet.

–– An AS is responsible for its own interiorAn AS is responsible for its own interior

routing.routing.

–– An AS is usually a large organization.An AS is usually a large organization.

For example, IU has its own AS, as does AT&T.For example, IU has its own AS, as does AT&T.

Page 27: Network Flow Analysis
Page 28: Network Flow Analysis
Page 29: Network Flow Analysis
Page 30: Network Flow Analysis
Page 31: Network Flow Analysis

Top 10 Top 10 ASesASes on Internet2 on Internet2

By degreeBy degree

1.1. HotmailHotmail

2.2. MicrosoftMicrosoft

3.3. Microsoft-EuropeMicrosoft-Europe

4.4. North Carolina (NCREN)North Carolina (NCREN)

5.5. Michigan (MERIT)Michigan (MERIT)

6.6. University of WashingtonUniversity of Washington

7.7. MITMIT

8.8. UC-BerkeleyUC-Berkeley

9.9. UMassUMass

10.10. China (CERNET)China (CERNET)

By strengthBy strength

1.1. AbileneAbilene

2.2. University of OregonUniversity of Oregon

3.3. HotmailHotmail

4.4. MicrosoftMicrosoft

5.5. North Carolina (NCREN)North Carolina (NCREN)

6.6. UCSDUCSD

7.7. UCLAUCLA

8.8. Michigan (MERIT)Michigan (MERIT)

9.9. University of WashingtonUniversity of Washington

10.10. UMassUMass

Page 32: Network Flow Analysis

TCP PortsTCP Ports

Page 33: Network Flow Analysis
Page 34: Network Flow Analysis
Page 35: Network Flow Analysis
Page 36: Network Flow Analysis
Page 37: Network Flow Analysis

Top 10 TCP Ports on Internet2Top 10 TCP Ports on Internet2

By degreeBy degree

1.1. WebWeb

2.2. GnutellaGnutella

3.3. MS MessengerMS Messenger

4.4. SQL ServerSQL Server

5.5. Web (Encrypted)Web (Encrypted)

6.6. GnutellaGnutella

7.7. MailMail

8.8. Web Tunneling (8082)Web Tunneling (8082)

9.9. BitTorrentBitTorrent

10.10. UsenetUsenet

By strengthBy strength

1.1. WebWeb

2.2. iperfiperf

3.3. iperfiperf

4.4. UsenetUsenet

5.5. RTP (Streaming)RTP (Streaming)

6.6. iperfiperf

7.7. SSHSSH

8.8. BitTorrentBitTorrent

9.9. Port 388 ?!?Port 388 ?!?

10.10. FTPFTP

Page 38: Network Flow Analysis

Where Do I Go Next?Where Do I Go Next?

Start to look at the dynamics of the network.Start to look at the dynamics of the network.

Focus on individual ports.Focus on individual ports.

Examine clustering coefficients.Examine clustering coefficients.

Attempt to filter out spoofed traffic.Attempt to filter out spoofed traffic.

Consider the server-only and client-onlyConsider the server-only and client-only

networks.networks.

–– This will involve treating flows as edges in a This will involve treating flows as edges in a bipartitebipartitegraph.graph.

Cluster nodes, Cluster nodes, ASesASes, and ports., and ports.

Page 39: Network Flow Analysis

Thank You!Thank You!

Any thoughts, questions, comments,Any thoughts, questions, comments,

complaints, or observations are allcomplaints, or observations are all

welcome!welcome!