Revisiting the Case for a Minimalist Approach for Network Flow Monitoring Vyas Sekar, Michael K...

Post on 17-Dec-2015

217 views 0 download

Transcript of Revisiting the Case for a Minimalist Approach for Network Flow Monitoring Vyas Sekar, Michael K...

1

Revisiting the Case for a Minimalist Approach for Network Flow Monitoring

Vyas Sekar, Michael K Reiter, Hui Zhang

Many Monitoring Applications

Traffic Engineering

Analyze new user apps

AnomalyDetection

Network Forensics

Worm Detection

Accounting

Botnet analysis

…….

3

Need to estimate different metrics

Traffic Engineering

Analyze new user apps

AnomalyDetection

Network ForensicsWorm

Detection

Accounting

Botnet analysis

…….

“Heavy-hitters”

“Degree histogram” “Entropy”, “Changes”

“SuperSpreaders”

“Flow size distribution”

4

How are these metrics estimated?Traffic

Packet Processing

Counter DataStructures

Application-LevelMetrics

Monitoring(on router)

Computation(off router)

5

Today’s solution: Packet SamplingTraffic

Packet Processing

Counter Data Structures

Monitoring(on router)

Computation(off router)

Sample packets uniformly

FlowId Pkt/ByteCounts

Compute metrics on sampled flows

Estimation is inaccurate for fine-grained analysisExtensive literature on limitations for many tasks!

Application-LevelMetrics

Flow = Packets with same Src/Dst Addr and Ports

6

Trend: Shift to Application-SpecificTraffic

Packet Processing

Counter Data Structures

Application-LevelMetric

Flow Size Distribution Entropy Superspreader

Complexity: Need per-metric implementation Early commitment: Applications are a moving target

Counter Data Structures

Application-LevelMetric

Packet Processing

Counter DataStructures

Application-LevelMetric

Packet Processing

….

11

What do we ideally want?Traffic

PacketProcessing

Counter Data Structures

Application-SpecificMetrics

Monitoring(on router)

Computation(off router)

Simple

High accuracy

Support many applications

12

Outline

• Motivation

• A Minimalist Alternative

• Evaluation

• Summary and discussion

13

RequirementsAnomaly

Worm Accounting

Botnet

2. General acrossapplications

1. Simple router implementation

3. Enable drill-down capabilities

4. Network-wideviews

14

How do we meet these requirements?

1. Simple router implementation

2. General across applications

3. Enable drill-down capabilities

4. Network-wide views

Delay binding to specific applications

15

What does it mean to delay binding?

Traffic

Packet Processing

Counter DataStructures

Application-LevelMetrics

Monitoring(on router)

Computation(off router)

Instead of splittingresources, Aggregate into generic primitives

Keep this stage as “generic” as possible

17

What Generic Primitives?

Two broad classes of monitoring tasks:

1. Communication structuree.g., Who talked to whom?

2. Volume structuree.g., How much traffic?

Flow sampling[Hohn, Veitch IMC ‘03]

Sample and Hold[Estan,Varghese SIGCOMM ’02]

18

Flow Sampling

Traffic

Packet Processing

Counter Data Structures

Hash(5-tuple) If hash < r, update

FlowId Pkt/ByteCounts

Flow = Packets with same Src/Dst Addr and Ports

Pick flows at random; not biased by flow sizeGood for “communication” patterns

19

Sample and Hold

Traffic

Packet Processing

Counter Data Structures

FlowId Pkt/ByteCounts

Flow = Packets with same Src/Dst Addr and Ports

Accurate counts of large flowsGood for “volume” queries

If flow in table, updateSample with prob pIf new, create entry

22

How do we meet these requirements?

1. Simple router implementation

2. General across applications

3. Enable drill-down capabilities

4. Network-wide views

Delay binding to specific applications

Generic primitives = FS,SH

Retain NetFlow’s operational model

24

Retain NetFlow operational modelApplication-Specific

FSD DegreeHistogram

Entropy

Summary Statistics Difficult to do

further analysise.g., why is X

high?

Can estimate new metrics!

FSD Entropy Deg

Minimalist

Flow reports

FS+SH

FSD DegreeHistogram

Entropy

25

How do we meet these requirements?

1. Simple router implementation

2. General across applications

3. Enable drill-down capabilities

4. Network-wide views

Retain NetFlow’s Operational model Keep flow reports

Network-wide resource management

Delay binding to specific applications

Generic primitives = FS,SH

26

Network-Wide Sample-and-Hold

1

1

1

1

1 23

47 55

Sample-and-HoldFlow Sampling

Repeating Sample-and-Hold wastes resources Do it once per-path

5

5

5

FS+SH FS+SH

FS+SH

FS+SH

FS+SH

27

Network-Wide Flow Sampling

11 23

47 55

Flow Sampling

Use cSamp [NSDI’08] to configure flow sampling capabilitiesHash-based coordination Non-overlapping sets of flowsNetwork-wide Optimization Operator goals e.g., per-path guarantee

1

5

9

8

2

3

94

7

8

28

Putting the pieces together: “Minimalist” Proposal

Traffic

Flow Sampling

FlowId Pkt/ByteCounts

Sample & Hold

h Hash(flowid) If h in FS_Range(path) Create/Update

If Ingress(path)If flow in table

Update With prob SH_p(path)

If new Create

FS_Range(path), SH_p(path) are configuration parameters e.g., via network-wide optimization using cSamp+

30

What do we ideally want?Traffic

PacketProcessing

Counter Data Structures

Application-SpecificMetrics

Monitoring(on router)

Computation(off router)

Simple

High accuracy

Support many applications

?

31

Outline

• Motivation

• A Minimalist Alternative

• Evaluation– Compare FS+SH vs. application-specific

• Summary and discussion

32

Assumptions in resource normalization• Hardware requirements are similar

– Both need per-packet array/key-value updates– More than pkt sampling, but within router capabilities

• Processing costs– Online cost lower for minimalist (don’t need per-app-instance)– Offline cost is higher for minimalist (but can be reduced, if necessary)

• Reporting bandwidth – Higher for minimalist, but < 1% of network capacity

• Memory for counters– Bottleneck is SRAM (Flow headers can be offloaded to DRAM)– We conservatively assume 4X more per-counter cost

34

Head-to-Head Comparison

Flow Size Distribution

OutdegreeHistogram…

Application-Specific Minimalist

+

+ =

Normalize SRAM

Relative Accuracy (Minimalist) – Accuracy (AppSpecific) accuracy = ---------------------------------------------------------------difference Accuracy (AppSpecific)

Application Portfolio

FS+SHFSD Entropy Degree

Flow Size Distribution

OutdegreeHistogram…

35

Resource split between FS and SH

We pick 80-20 split as a good operation pointRelative difference is positive for most applications!

+ good- bad

Run application-specific algorithms with recommended parameters (details in paper)Measure memory use; Run FS+SH with aggregate, but normalized (1/4X) memory Packet trace from CAIDA; consistent over other traces

36

Varying the application portfolioMinimalist vs. Application-specific under same resources

+ good- bad

More tasks or some resource-intensive Better across entire portfolio!“Sharing” effect across estimation tasks

Application portfolio

Packet trace from CAIDA; consistent over other traces

Rela

tive

accu

racy

diff

eren

ce

37

Network-Wide View

Estimation(error metric)

ApplicationSpecific

UncoordinatedFS + SH

Coordinated FS +SH

FSD(WMRD)

0.16 0.19 0.02

Heavy Hitter(miss rate)

0.02 0.3 0.04

Entropy(relative error)

not available 0.03 0.02

SuperSpreader(miss rate)

0.02 0.04 0.01

Deg. Histogram(JL-divergence)

0.15 0.03 0.02Configured per-ingress can’t get network-wide!

Introduces some biases due to duplicates

1. App-Specific: Difficult to generate different views e.g., per-OD-pair

2. Coordination: better performance & operational simplicity

Lower

Better

Flow-level traces from Internet2. Configure Application-Specific per PoPMeasure resource consumption, normalize and give to network-wide FS+SH

38

Conclusions and discussion

Even a simple “minimalist” approach might work

Key: Focus on portfolio rather than individual tasksProposal: FS + SH (complementary) ; cSamp-like mgmt

• Implications for device vendors and operators– Late binding, lower complexity

• Quest for feasibility not optimalityBetter primitives, combination, estimation?Is this sufficient?