Automatically Inferring Patterns of Resource Consumption in Network Traffic

31
Automatically Inferring Automatically Inferring Patterns of Resource Patterns of Resource Consumption in Network Consumption in Network Traffic Traffic Cristian Estan, Stefan Savage, George Varghese University of California, San Diego

description

Automatically Inferring Patterns of Resource Consumption in Network Traffic. Cristian Estan, Stefan Savage, George Varghese University of California, San Diego. Who is using my link?. Looking at the traffic. Too much data for a human. Do something smarter!. Src. IP. Dest. IP. Dest. IP. - PowerPoint PPT Presentation

Transcript of Automatically Inferring Patterns of Resource Consumption in Network Traffic

Page 1: Automatically Inferring Patterns of Resource Consumption in Network Traffic

Automatically Inferring Automatically Inferring Patterns of Resource Patterns of Resource

Consumption in Network Consumption in Network TrafficTraffic

Cristian Estan, Stefan Savage, George VargheseUniversity of California, San Diego

Page 2: Automatically Inferring Patterns of Resource Consumption in Network Traffic

April 24, 2023 Traffic Clusters - 2003 2

Who is using my link?Who is using my link?

Page 3: Automatically Inferring Patterns of Resource Consumption in Network Traffic

April 24, 2023 Traffic Clusters - 2003 3

Do something smarter!

Too much data for a human

Looking at the trafficLooking at the traffic

Page 4: Automatically Inferring Patterns of Resource Consumption in Network Traffic

April 24, 2023 Traffic Clusters - 2003 4

Looking at traffic aggregatesLooking at traffic aggregates

Aggregating on individual packet header fields gives useful results but

Traffic reports are not always at the right granularity (e.g. individual IP address, subnet, etc.)

Cannot show aggregates defined over multiple fields (e.g. which network uses which application)

The traffic analysis tool should automatically find aggregates over the right fields at the right granularity

Rank Destination IP Traffic

1 jeff.dorm.bigU.edu 11.9%

2 tracy.dorm.bigU.edu 3.12%3 risc.cs.bigU.edu 2.83%

Most traffic goes to the dorms …

Rank Destination network Traffic

1 library.bigU.edu 27.5%

2 cs.bigU.edu 18.1%3 dorm.bigU.edu 17.8%

What apps are used?

Rank Source port Traffic

1 Web 42.1%

2 Kazaa 6.7%3 Ssh 6.3%

Dest. IP

Dest. net

Source port

Where does the traffic come

from?……

Src. IP Src. port

Src. net

Dest. portDest. IP

Dest. net

Protocol

Which network uses

web and which one kazaa?

Page 5: Automatically Inferring Patterns of Resource Consumption in Network Traffic

April 24, 2023 Traffic Clusters - 2003 5

Ideal traffic reportIdeal traffic reportTraffic aggregate Traffic

Web traffic 42.1%

Web traffic to library.bigU.edu 26.7%

Web traffic from www.schwarzenegger.com 13.4%

ICMP traffic from sloppynet.badU.edu to jeff.dorm.bigU.edu 11.9%

Web is the dominant applicationThe library is a

heavy user of webThat’s a big flash

crowd!

This is a Denial of Service attack !!

This paper is about giving the network administrator insightful traffic reports

Page 6: Automatically Inferring Patterns of Resource Consumption in Network Traffic

April 24, 2023 Traffic Clusters - 2003 6

Contributions of this paperContributions of this paper

Approach

Definitions

Algorithms

System

Experience

Page 7: Automatically Inferring Patterns of Resource Consumption in Network Traffic

April 24, 2023 Traffic Clusters - 2003 7

ApproachApproach Characterize traffic mix by describing all important

traffic aggregates Multidimensional aggregates (e.g. flash crowd

described by protocol, port number and IP address)

Aggregates at the the right level of granularity (e.g. computer, subnet, ISP)

Traffic analysis is automated – finds insightful data without human guidance

Page 8: Automatically Inferring Patterns of Resource Consumption in Network Traffic

April 24, 2023 Traffic Clusters - 2003 8

Definition: traffic clustersDefinition: traffic clusters Traffic clusters are the multidimensional traffic

aggregates identified by our reports A cluster is defined by a range for each field The ranges are from natural hierarchies (e.g. IP

prefix hierarchy) – meaningful aggregates Example

Traffic aggregate: incoming web traffic for CS Dept. Traffic cluster: ( SrcIP=*, DestIP in 132.239.64.0/21,

Proto=TCP, SrcPort=80, DestPort in [1024,65535] )

Page 9: Automatically Inferring Patterns of Resource Consumption in Network Traffic

April 24, 2023 Traffic Clusters - 2003 9

Traffic reports give the volume of chosen traffic clusters To keep report size manageable describe only clusters

above threshold (e.g. H=total of traffic/20) To avoid redundant data compress by omitting clusters

whose traffic can be inferred (up to error H) from non-overlapping more specific clusters in the report

To highlight non-obvious aggregates prioritize by using unexpectedness label

Example» 50% of all traffic is web» Prefix B receives 20% of all traffic» The web traffic received by prefix B is 15% instead of

50%*20%=10%, unexpectedness label is 15%/10%=150%

Definition: traffic reportDefinition: traffic report

Page 10: Automatically Inferring Patterns of Resource Consumption in Network Traffic

April 24, 2023 Traffic Clusters - 2003 10

Contributions of this paperContributions of this paper

Approach

Definitions

Algorithms

System

Experience

Page 11: Automatically Inferring Patterns of Resource Consumption in Network Traffic

April 24, 2023 Traffic Clusters - 2003 11

Algorithms and theoryAlgorithms and theory Algorithms and theoretical bounds in the paper

Unidimensional reports are easy to compute Multidimensional reports are exponentially harder as we

add more fields

Next few slides Example of unidimensional compression Example for the structure of the multidimensional

cluster space

Page 12: Automatically Inferring Patterns of Resource Consumption in Network Traffic

April 24, 2023 Traffic Clusters - 2003 12

Unidimensional report exampleUnidimensional report example

10.0.0.2 10.0.0.3 10.0.0.4 10.0.0.5 10.0.0.8 10.0.0.9 10.0.0.10 10.0.0.14

15 35 30 40 160 110 35 75

10.0.0.2/31 10.0.0.4/3150 10.0.0.8/31 10.0.0.10/3170 270 35 75

10.0.0.0/30 10.0.0.4/30 10.0.0.8/30 7530550 70

10.0.0.0/29 10.0.0.8/29120 380

10.0.0.0/28 500500

120 380

305

270

160 110

HierarchyThreshold=100

10.0.0.14/31

10.0.0.12/30

Page 13: Automatically Inferring Patterns of Resource Consumption in Network Traffic

April 24, 2023 Traffic Clusters - 2003 13

270

120

500

305

380

160 110

Unidimensional report exampleUnidimensional report example

10.0.0.8 10.0.0.9

10.0.0.0/29 10.0.0.8/29

10.0.0.8/31

10.0.0.8/30

10.0.0.0/28

120 380

160 110

Compression

305-270<100

380-270≥100Source IP Traffic

10.0.0.0/29 120

10.0.0.8/29 380

10.0.0.8 160

10.0.0.9 110

Page 14: Automatically Inferring Patterns of Resource Consumption in Network Traffic

April 24, 2023 Traffic Clusters - 2003 14

Multidimensional structure ex.Multidimensional structure ex.

All traffic All traffic

US EU

CA NY GB DE

Web Mail

Source net Application

US Web

Nodes (clusters) have multiple parents

US

Web

Nodes (clusters) overlap

CA

Page 15: Automatically Inferring Patterns of Resource Consumption in Network Traffic

April 24, 2023 Traffic Clusters - 2003 15

Contributions of this paperContributions of this paper

Approach

Definitions

Algorithms

System

Experience

Page 16: Automatically Inferring Patterns of Resource Consumption in Network Traffic

April 24, 2023 Traffic Clusters - 2003 16

System: AutoFocusSystem: AutoFocus

Trafficparser

Web basedGUI

Cluster miner

Grapher

Packet header trace

categories

names

Page 17: Automatically Inferring Patterns of Resource Consumption in Network Traffic

April 24, 2023 Traffic Clusters - 2003 17

Page 18: Automatically Inferring Patterns of Resource Consumption in Network Traffic

April 24, 2023 Traffic Clusters - 2003 18

Page 19: Automatically Inferring Patterns of Resource Consumption in Network Traffic

April 24, 2023 Traffic Clusters - 2003 19

Page 20: Automatically Inferring Patterns of Resource Consumption in Network Traffic

April 24, 2023 Traffic Clusters - 2003 20

Contributions of this paperContributions of this paper

Approach

Definitions

Algorithms

System

Experience

Page 21: Automatically Inferring Patterns of Resource Consumption in Network Traffic

April 24, 2023 Traffic Clusters - 2003 21

Backups from CAIDA to tape server Semi-regular time pattern

FTP from SLAC Stanford

Scripps web traffic

Web & Squid servers

Large ssh traffic

Steady ICMP probing from CAIDA

Structure of regular traffic mixStructure of regular traffic mix

SD-NAP

SD-NAP

Page 22: Automatically Inferring Patterns of Resource Consumption in Network Traffic

April 24, 2023 Traffic Clusters - 2003 22

Analysis of unusual eventsAnalysis of unusual events UCSD to UCLA route change Sapphire/SQL Slammer worm

Site 2

Page 23: Automatically Inferring Patterns of Resource Consumption in Network Traffic

April 24, 2023 Traffic Clusters - 2003 23

ConclusionsConclusions

1010111101010000101011111101011001010101101011010000101010100101010111101010101000101111010000010111111101011001010111010111100100101010100011011111100010101110110101100101010110101111000010101011110111010111010101010111111010110010101011010101111101010000110100001011010100101011001000000101011001010101011111000010001000010101011110101000010111001010101101011110000010101011111101011000101111010000010111110101011010111100100101010110010101010001010100101010110101010010111001010000010100001110110101010110111111000101011101011101011001010101101011110000110111101110101110101010101111110101100101010110101111011101010000110101010010101101010111010101001010000101011010101001010100000101010101010101101011101010100000010101010101101010101011110101110101011010100011000101010010111010101001101010100001000110101111010100010110

Page 24: Automatically Inferring Patterns of Resource Consumption in Network Traffic

April 24, 2023 Traffic Clusters - 2003 24

ConclusionsConclusions Multidimensional traffic clusters using natural hierarchies

describe traffic aggregates Traffic reports using thresholding identify automatically

conspicuous resource consumption at the right granularity Compression produces compact traffic reports and

unexpectedness labels highlight non-obvious aggregates Our prototype system, AutoFocus, provides insights into the

structure of regular traffic and unexpected events

Page 25: Automatically Inferring Patterns of Resource Consumption in Network Traffic

April 24, 2023 Traffic Clusters - 2003 25

Thank you!Thank you! Alpha version of AutoFocus downloadable from

http://ial.ucsd.edu/AutoFocus/

Any questions?

Acknowledgements: NIST, NSF, Vern Paxson, David Moore, Liliana Estan, Jennifer Rexford, Alex Snoeren, Geoff Voelker

Page 26: Automatically Inferring Patterns of Resource Consumption in Network Traffic

April 24, 2023 Traffic Clusters - 2003 26

Bounds and running timesBounds and running timesReport size Running time Memory usage

unc. 1dim. rep. ≤1+(d-1)T/H O(n+m(d-1)) O(m(d-1))

1dim. report ≤ T/H linear linear

1dim. Δ report ≤T1/H+T2/H linear

unc. +dim. rep. ≤ T/H ∏di ≈result*n O(m+result)

+dim. rep. ≤ T/H ∏di/max(di)

+dim. Δ report ≈eresult

Page 27: Automatically Inferring Patterns of Resource Consumption in Network Traffic

April 24, 2023 Traffic Clusters - 2003 27

Open questionsOpen questions Are there tighter bounds for the size of the reports? Are there algorithms that produce smaller results? Are there algorithms that compute traffic reports

more efficiently? In streaming fashion?

Page 28: Automatically Inferring Patterns of Resource Consumption in Network Traffic

April 24, 2023 Traffic Clusters - 2003 28

Delta reportsDelta reports Why repeat the same traffic report if the traffic doesn’t

change from one day to the other? Delta reports describe the clusters that increased or

decreased by more than the threshold from one interval to the other

On related traffic mixes delta reports much smaller than traffic reports

Multidimensional compression very hard for delta reports We have only exponential algorithm for the cluster delta

Page 29: Automatically Inferring Patterns of Resource Consumption in Network Traffic

April 24, 2023 Traffic Clusters - 2003 29

Greedy compression algorithmGreedy compression algorithm

Page 30: Automatically Inferring Patterns of Resource Consumption in Network Traffic

April 24, 2023 Traffic Clusters - 2003 30

Multidimensional report exampleMultidimensional report exampleThresholding Compression

Page 31: Automatically Inferring Patterns of Resource Consumption in Network Traffic

April 24, 2023 Traffic Clusters - 2003 31

System detailsSystem details

Part Language LoC Status

Backend C++ 5400 stable

GUI HTML,Javascript

1000 functional

Glue perl 350 evolving