Anemone Edge-based network management research.microsoft/projects/anemone

28
Anemone Edge-based network management http://www.research.microsoft.com/projec ts/anemone/ Mort (Richard Mortier) Paul Barham, Austin Donnelly, Rebecca Isaacs

description

Anemone Edge-based network management http://www.research.microsoft.com/projects/anemone/. Mort (Richard Mortier) Paul Barham, Austin Donnelly, Rebecca Isaacs. Preamble: Microsoft Research. Over 700 people worldwide, spread through 6 research labs - PowerPoint PPT Presentation

Transcript of Anemone Edge-based network management research.microsoft/projects/anemone

Page 1: Anemone Edge-based network management research.microsoft/projects/anemone

AnemoneEdge-based network managementhttp://www.research.microsoft.com/projects/anemone/

Mort (Richard Mortier)

Paul Barham, Austin Donnelly, Rebecca Isaacs

Page 2: Anemone Edge-based network management research.microsoft/projects/anemone

• Over 700 people worldwide, spread through 6 research labs– Bangalore, Beijing, Cambridge, Redmond, San Francisco, Silicon Valley– Cover a wide range of CS and EE areas

• MSR Charter– Advance the state-of-the-art through cutting-edge research and publishing

in the open literature– Provide competitive edge to Microsoft’s product groups through technology

transfer and consultation– Engage with academic community through participation in conferences,

programme committees, journal editorial boards, student thesis committees

• Cambridge lab is about 80 researchers, split into 4 main areas– Networking, systems, distributed systems

• Magpie, Topology discovery, Pastry, Avalanche, Vigilante, Anemone

– Languages, security, theory– Graphics, vision, machine learning– Integrated systems, HCI, hardware

Preamble: Microsoft Research

Page 3: Anemone Edge-based network management research.microsoft/projects/anemone

The process of monitoring and controlling a large complex distributed system of dumb devices where failures are common and resources scarce

• Networks are large: 105 hosts, 103 routers• Networks are heterogeneous:

130 router hardware/OS combinations• Networks run distributed protocols:

OSPF, BGP, all very loosely synchronized• Networks undergo continuous change:

links fail and recover, upgrades occur

Network management is hard!

Page 4: Anemone Edge-based network management research.microsoft/projects/anemone

State of the art?

Tools to help visualize and inspect network

1. Get topology – Recursive use of ping and traceroute

2. Get traffic data– Routers using SNMP and NetFlowTM

3. Analyze and present the data– Wrap it all up in a GUI: triggers, graphs, top-10s, etc

Page 5: Anemone Edge-based network management research.microsoft/projects/anemone

Unfortunately…

There are problems!

• Traffic is becoming more opaque to the network core– Increasing deployment of IPSec, tunnelling, encryption

• traceroute data is ambiguous and only polls the topology– Best case is the reverse path anyway

• SNMP data is often buggy– Non-critical part of router operation

• Routers are often resource starved– Not built using the latest CPU, memory technologies

• The result is that such systems can end up presenting inaccurate, untimely, incomplete data

Page 6: Anemone Edge-based network management research.microsoft/projects/anemone

Edge-based distributed network management platform

• Collect flow information from hosts, and• Combine with topology information from routing

protocols

Enables applications• Visualize current network state• Analyse flow data for intrusion detection• Simulate reconfiguration/failure for planning• Control the network, automatically and in real-time

Anemone

Page 7: Anemone Edge-based network management research.microsoft/projects/anemone

Benefits

Anemone has a priori benefits over state of the art

• Visibility into opaque protocols– See into encrypted/tunnelled traffic e.g. IPSec, PPtP

• Plentiful resources at hosts– They need only deal with their own traffic

• Independence from poor quality data– No more reliance on SNMP and traceroute data

Page 8: Anemone Edge-based network management research.microsoft/projects/anemone

Where is my traffic going today?Anemone is a platform for network management apps

• Pictures of current topology and traffic– Routes+flows+forwarding rules BIG PICTURE

• In fact, where did my traffic go yesterday?– Keep historical data for capacity planning, etc

• A platform for anomaly detection– Historical data suggests “normality,” live monitoring allows

anomalies to be detected

Applications

Page 9: Anemone Edge-based network management research.microsoft/projects/anemone

Applications

Where might my traffic go tomorrow? Anemone enables ‘what-if’ analysis

• Plug into a simulator back-end– Discrete event simulator or flow allocation solver

• Run multiple ‘what-if’ scenarios– …failures– …reconfigurations– …technology deployments

E.g. “What happens to the network if we coalesce all the mail servers into one datacenter?”

Page 10: Anemone Edge-based network management research.microsoft/projects/anemone

Applications

Where should my traffic be going?Anemone helps close the control loop

• Use it to support an application that recomputes link weights to implement policy goals– Recomputation on the order of hours or days

• This enables more dynamic policies– Network configuration could be modified to track e.g.

time of day/week/year load changes

• …potentially reducing bandwidth costs

Page 11: Anemone Edge-based network management research.microsoft/projects/anemone

Where are we now?

Studying feasibility and building prototypes

• Three major components– Flow collection– Route collection– Anemone platform

Page 12: Anemone Edge-based network management research.microsoft/projects/anemone

Data collection: flows

Synthesise flow data from low-level packet tracing

• Hosts track active flows – Using ETW, low overhead event posting infrastructure

– Built prototype device driver provider & user-space consumer

• Took 24h packet traces from a client and a server– Peaks were at 165, respectively 5667, live flows per sec

and 39, respectively 567, active flows per sec

• Quite manageable sized datasets

Page 13: Anemone Edge-based network management research.microsoft/projects/anemone
Page 14: Anemone Edge-based network management research.microsoft/projects/anemone
Page 15: Anemone Edge-based network management research.microsoft/projects/anemone

Interlude: OSPF routing 101

How does a packet get from any A to any B?Learn network topology; compute shortest paths

• For each node1. Discover adjacencies (~immediate neighbours)

2. Advertise these link states to all other routers

3. Build link state database (~network topology)

4. Compute shortest paths to all destination prefixes

5. Forward to next-hop using longest-prefix-match (~most specific route)

Page 16: Anemone Edge-based network management research.microsoft/projects/anemone

Data collection: routes

Passive collection of network critical control protocol

• OSPF is link-state so collect link state adverts• Completely passive, modulo configuration• Process data to recover network “events” and topology

• Data collected for (local, backbone) areas (20 days)– LSA DB size: (700, 1048) LSAs ~ (21, 34) kB– Event totals: (2526, 3238) events ~ (5.3, 6.7) evts/hr

• Small, generally stable with bursts of activity

Page 17: Anemone Edge-based network management research.microsoft/projects/anemone

NB: Spike to ~100 from initial DB collection truncated for readability

Page 18: Anemone Edge-based network management research.microsoft/projects/anemone

steady state

complete dataset

10 mins: data ca. 25/Nov?

30 mins: LSRefreshTime?

35 mins: LSRefreshTime+CheckAge?

1–2 mins: RouterDeadInterval?

Page 19: Anemone Edge-based network management research.microsoft/projects/anemone

The Anemone platform

Data unification, distribution and presentation

• “Distributed database,” logically containing

1. Traffic flow matrix (bandwidths), {srcs} × {dsts}• Hosts can supply flows they source and sink• Only need a subset of this data to get complete traffic matrix

2. …each entry annotated with current route, src to dst• Note src/dst might be e.g. (IP end-point, application)• OSPF supplies topology → routes

Page 20: Anemone Edge-based network management research.microsoft/projects/anemone

System outline

Control

Packets

Flows

Routeingprotocol

Topology

VisualizeSimulate

Simulator

Anemoneplatform

Traffic matrix Set of routes

srcs

dsts

routes

Hosts

Routers

Page 21: Anemone Edge-based network management research.microsoft/projects/anemone

The Anemone platform

Provides an API for presenting data

• Wish to be able to answer queries like– “Who are the top-10 traffic generators?”

• Easy to aggregate, don’t care about topology

– “What is the load on link l?”• Can aggregate from hosts, but need to know routes

– “What happens if we remove links {l…m}?”• Interaction between traffic matrix, topology, even flow control

• Related work– { distributed, continuous query, temporal } databases– Sensor networks, Astrolabe, SDIMS, PHI …

Page 22: Anemone Edge-based network management research.microsoft/projects/anemone

The Anemone platform

Currently forming the core of the demo!

• Have simulation model– OSPF data gives topology, event list, routes– Simple load model to start with (load ~ # subnets)– Predecessor matrix (from SPF) reduces flow-data query

set

• Where/what/how much to distribute/aggregate? – Is data read- or write-dominated?– Which is more dynamic, flow or topology data?– Can the system successfully self-tune?

Page 23: Anemone Edge-based network management research.microsoft/projects/anemone

The Anemone platform

Many outstanding research questions

• Can we do as well/better than e.g. NetFlowTM?– Accuracy of data vs. completeness of instrumentation

• Which data sets should we distribute and how? – Just OSPF data? Just flow data? A mixture?

– Use DHTs? IP multicast?

• How many levels of aggregation?– How many nodes should a query touch?

• What sort of API is suitable?– Example queries for sample applications

Page 24: Anemone Edge-based network management research.microsoft/projects/anemone

http://www.research.microsoft.com/projects/anemone/

Building a coherent edge-based network management platform using flow monitoring and standard routeing protocols

• Applications include visualization, simulation, dynamic control

• Research issues include– Accuracy: will not be able to monitor 100% of traffic– Scalability: want to manage a 300,000 node network– Robustness: must work as nodes fail or network partitions– Control systems: use the data to optimize the network in real-

time, as well as just observe and simulate

Page 25: Anemone Edge-based network management research.microsoft/projects/anemone

Backup slides

• SNMP

• Internet routeing

• Security

Page 26: Anemone Edge-based network management research.microsoft/projects/anemone

SNMP

Protocol to manage information tables at devices

• Provides get, set, trap, notify operations– get, set: read, write values– trap: signal a condition (e.g. threshold exceeded)– notify: reliable trap

• Complexity mostly in the table design– Some standard tables, but many vendor specific– Non-critical, so often tables populated incorrectly

Page 27: Anemone Edge-based network management research.microsoft/projects/anemone

Internet routeing

• Q: how to get a packet from node to destination?

• A1: advertise all reachable destinations and apply a consistent cost function (distance vector)

• A2: learn network topology and compute consistent shortest paths (link state)– Each node (1) discovers and advertises adjacencies;

(2) builds link state database; (3) computes shortest paths

• A1, A2: Forward to next-hop using longest-prefix-match

Page 28: Anemone Edge-based network management research.microsoft/projects/anemone

Security

• Threat: malicious/compromised host– Authenticate participants– Must secure route collector as if a router

• Threat: DoS on monitors– Difference between client under DoS and server?– Rate pace output from monitors

• Threat: eavesdropping– Standard IPSec/encryption solutions

• Have not considered cross-domain implications