Anemone: Edge-based network management Mort (Richard Mortier) MSR-Cambridge December 2004.

31
Anemone: Edge-based network management Mort (Richard Mortier) MSR-Cambridge December 2004

Transcript of Anemone: Edge-based network management Mort (Richard Mortier) MSR-Cambridge December 2004.

Page 1: Anemone: Edge-based network management Mort (Richard Mortier) MSR-Cambridge December 2004.

Anemone:Edge-based network management

Mort (Richard Mortier)

MSR-Cambridge

December 2004

Page 2: Anemone: Edge-based network management Mort (Richard Mortier) MSR-Cambridge December 2004.

…is the process of monitoring and controlling a large complex distributed system of dumb devices where failures are common and resources scarce

Enterprise networks are large but closely managed Contrast with the Internet or university campus networks

No-one has the big picture! Internet routeing uses distributed protocols

Current management tools all consider local info Patchy SNMP support, configuration issues, sampling

artefacts, tools generate CPU and network load

Network management

Page 3: Anemone: Edge-based network management Mort (Richard Mortier) MSR-Cambridge December 2004.

Building edge-based network management platform Collect flow information from hosts, and Combine with topology information from routeing protocols

Enable visualization, analysis, simulation, control

Avoid problems of not-quite-standard interfaces Management support is typically ‘non-critical’ (i.e. buggy )

and not extensively tested for inter-operability Do the work where resources are plentiful

Hosts have lots of cycles and little traffic (relatively) Protocol visibility: see into tunnels, IPSec, etc

Anemone

Page 4: Anemone: Edge-based network management Mort (Richard Mortier) MSR-Cambridge December 2004.

Problem context: Enterprise networks Large

105 edge devices, 103 network devices Geographically distributed

Multiple continents, 102 countries Tightly controlled

IT department has (nearly) complete control over user desktops and network connected equipment

Page 5: Anemone: Edge-based network management Mort (Richard Mortier) MSR-Cambridge December 2004.

Talk outline

System outline

What would it be good for?

In more detail…

Research issues

Page 6: Anemone: Edge-based network management Mort (Richard Mortier) MSR-Cambridge December 2004.

System outline

Control

Packets

Flows

Routeingprotocol

Topology

VisualizeSimulate

Simulator

Anemoneplatform

Traffic matrix Set of routes

srcs

dsts

routes

Page 7: Anemone: Edge-based network management Mort (Richard Mortier) MSR-Cambridge December 2004.

Pictures of current topology and traffic Routes+flows+forwarding rules BIG PICTURE

In fact, where did my traffic go yesterday? Keep historical data for capacity planning, etc

A platform for anomaly detection Historical data suggests “normality,” live

monitoring allows anomalies to be detected

Where is my traffic going today?

Page 8: Anemone: Edge-based network management Mort (Richard Mortier) MSR-Cambridge December 2004.

Where might my traffic go tomorrow? Plug into a simulator back-end

Discrete event simulator, flow allocation solver Run multiple ‘what-if’ scenarios

…failures …reconfigurations …technology deployments

E.g. “What happens if we coalesce all the Exchange servers in one data-centre?”

Page 9: Anemone: Edge-based network management Mort (Richard Mortier) MSR-Cambridge December 2004.

Where should my traffic be going? Close the loop: compute link weights to

implement policy goals Recompute on order of hours/days

Allows more dynamic policies Modify network configuration to track e.g. time of

day load changes Make network more efficient (~cheaper)?

Page 10: Anemone: Edge-based network management Mort (Richard Mortier) MSR-Cambridge December 2004.

Where are we now?

Three major components Flow collection Route collection Anemone platform

Studying feasibility and building prototypes

Page 11: Anemone: Edge-based network management Mort (Richard Mortier) MSR-Cambridge December 2004.

Data collection: flows

Hosts track active flows Using ETW, low overhead event posting

infrastructure Built prototype device driver provider & user-

space consumer Used 24h packet traces from (client, server)

for feasibility study Peaks at (165, 5667) live and (39, 567) active

flows per sec

Page 12: Anemone: Edge-based network management Mort (Richard Mortier) MSR-Cambridge December 2004.
Page 13: Anemone: Edge-based network management Mort (Richard Mortier) MSR-Cambridge December 2004.
Page 14: Anemone: Edge-based network management Mort (Richard Mortier) MSR-Cambridge December 2004.

Data collection: routes

OSPF is link-state so collect link state adverts Similar to Sprint IS-IS collection Was also done at AT&T (NSDI’04 paper)

Completely passive Modulo configuration Process data to recover network “events” and topology

Data collected for (local, backbone) areas (20 days) LSA DB size: (700, 1048) LSAs ~ (21, 34) kB Event totals: (2526, 3238) events ~ (5.3, 6.7) evts/hr

Small, generally stable with bursts of activity

Page 15: Anemone: Edge-based network management Mort (Richard Mortier) MSR-Cambridge December 2004.

NB: Spike to ~100 from initial DB collection truncated for readability

Page 16: Anemone: Edge-based network management Mort (Richard Mortier) MSR-Cambridge December 2004.
Page 17: Anemone: Edge-based network management Mort (Richard Mortier) MSR-Cambridge December 2004.

steady state

complete dataset

10 mins: data ca. 25/Nov?

30 mins: LSRefreshTime?

35 mins: LSRefreshTime+CheckAge?

1–2 mins: RouterDeadInterval?

Page 18: Anemone: Edge-based network management Mort (Richard Mortier) MSR-Cambridge December 2004.

The Anemone platform

“Distributed database,” logically containing 1. Traffic flow matrix (bandwidths), {srcs} × {dsts}

Hosts can supply flows they source and sink Only need a subset of this data to get complete traffic matrix

2. …each entry annotated with current route, src to dst Note src/dst might be e.g. (IP end-point, application) OSPF supplies topology → routes

Where/what/how much to distribute/aggregate? Is data read- or write-dominated? Which is more dynamic, flow or topology data? Can the system successfully self-tune?

Page 19: Anemone: Edge-based network management Mort (Richard Mortier) MSR-Cambridge December 2004.

The Anemone platform

Wish to be able to answer queries like “Who are the top-10 traffic generators?”

Easy to aggregate, don’t care about topology “What is the load on link l?”

Can aggregate from hosts, but need to know routes “What happens if we remove links {l…m}?”

Interaction between traffic matrix, topology, even flow control

Related work { distributed, continuous query, temporal } databases Sensor networks, Astrolabe, SDIMS, PHI …

Page 20: Anemone: Edge-based network management Mort (Richard Mortier) MSR-Cambridge December 2004.

The Anemone platform

Building simulation model OSPF data gives topology, event list, routes Simple load model to start with (load ~ # subnets) Predecessor matrix (from SPF) reduces flow-data query set

Can we do as well/better than e.g. NetFlow? Accuracy/coverage trade-off

How should we distribute the data and by what protocols? Just OSPF data? Just flow data? A mixture?

How many levels of aggregation? How many nodes do queries touch?

What sort of API is suitable? Example queries for sample applications

Page 21: Anemone: Edge-based network management Mort (Richard Mortier) MSR-Cambridge December 2004.

Research issues

Corner cases Scalability

Robustness, accuracy Control systems

Page 22: Anemone: Edge-based network management Mort (Richard Mortier) MSR-Cambridge December 2004.

Research issues

Corner cases Multi-homed hosts: how best to define a flow L4 routeing, NAT, proxy ARP, transparent proxies (Solve using device config files, perhaps SNMP)

Scalability Host measurement must not be intrusive (in terms of

packet latency, CPU load, network bandwidth) Aggregators must elect themselves in such a way that they

do not implode under event load What happens if network radically alters? E.g.

Extensive use of multicast Connection patterns shift due to e.g. P2P deployment

Page 23: Anemone: Edge-based network management Mort (Richard Mortier) MSR-Cambridge December 2004.

Research issues

Robustness Network management had better still work as nodes fail or

the network partitions! Accuracy in the face of late, partial information

By accident: unmonitored hosts By design: aggregation, more detail about local area Inference of link contribution to cumulative metrics, e.g. RTT

Network control: modify link weights How efficient is the current configuration anyway? What are plausible timescales to reconfigure?

Page 24: Anemone: Edge-based network management Mort (Richard Mortier) MSR-Cambridge December 2004.

Summary

Aim to build a coherent edge-based network management platform using flow monitoring and standard routeing protocols Applications include visualization, simulation, dynamic

control Research issues include

Scalability: want to manage a 300,000 node network Robustness: must work as nodes fail or network partitions Accuracy: will not be able to monitor 100% of traffic Control systems: use the data to optimize the network in

real-time, as well as just observe and simulate

Page 25: Anemone: Edge-based network management Mort (Richard Mortier) MSR-Cambridge December 2004.

Current status

Submitted Networking 2005 paper Prototype ETW provider/consumer driver Studied feasibility of flow monitoring Prototype OSPF collector & topology reconstruction

Investigating “distributed database” via simulation Query properties System decomposition Protocols for data distribution

Questions, comments?

Page 26: Anemone: Edge-based network management Mort (Richard Mortier) MSR-Cambridge December 2004.

Backup slides

SNMP Internet routeing OSPF BGP Security

Page 27: Anemone: Edge-based network management Mort (Richard Mortier) MSR-Cambridge December 2004.

SNMP

Protocol to manage information tables at devices Provides get, set, trap, notify operations

get, set: read, write values trap: signal a condition (e.g. threshold exceeded) notify: reliable trap

Complexity mostly in the table design Some standard tables, but many vendor specific Non-critical, so often tables populated incorrectly

Page 28: Anemone: Edge-based network management Mort (Richard Mortier) MSR-Cambridge December 2004.

Internet routeing

Q: how to get a packet from node to destination?

A1: advertise all reachable destinations and apply a consistent cost function (distance vector)

A2: learn network topology and compute consistent shortest paths (link state) Each node (1) discovers and advertises adjacencies;

(2) builds link state database; (3) computes shortest paths A1, A2: Forward to next-hop using longest-prefix-

match

Page 29: Anemone: Edge-based network management Mort (Richard Mortier) MSR-Cambridge December 2004.

OSPF (~link state routeing)

Q: how to route given packet from any node to destination?

A: learn network topology; compute shortest paths

For each node Discover adjacencies (~immediate neighbours); advertise Build link state database (~network topology) Compute shortest paths to all destination prefixes Forward to next-hop using longest-prefix-match (~most

specific route)

Page 30: Anemone: Edge-based network management Mort (Richard Mortier) MSR-Cambridge December 2004.

BGP (~path vector routeing)

Q: how to route given packet from any node to destination? A: neighbours tell you destinations they can reach; pick cheapest

option

For each node Receive (destination, cost, next-hop) for all destinations known to

neighbour Select among all possible next-hops for given destination Advertise selected (destination, cost+, next-hop') for all known

destinations Selection process is complicated Routes can be modified/hidden at all three stages

General mechanism for application of policy

Page 31: Anemone: Edge-based network management Mort (Richard Mortier) MSR-Cambridge December 2004.

Security

Threat: malicious/compromised host Authenticate participants Must secure route collector as if a router

Threat: DoS on monitors Difference between client under DoS and server? Rate pace output from monitors

Threat: eavesdropping Standard IPSec/encryption solutions