Revised PPT

40
22 Mar 2005 Finding Network Problems that Influence Applications: Measurement Tools Matt Zekauskas, [email protected] Georgia Performance Workshop DRAFT DRAFT for comment DRAFT DRAFT

description

 

Transcript of Revised PPT

Page 1: Revised PPT

22 Mar 2005

Finding Network Problems that Influence Applications:Measurement Tools

Matt Zekauskas, [email protected] Georgia Performance WorkshopDRAFT DRAFT for comment DRAFT DRAFT

Page 2: Revised PPT

22-Mar-2005 2Finding Network Problems: Measurement Tools

Outline

Problems, typical causes, diagnostic strategies

Tools: First mile, host issues

Tools: Path issues

Tools: Others to be aware of

Tools within Abilene

End-to-End Measurement Infrastructure

Page 3: Revised PPT

22-Mar-2005 3Finding Network Problems: Measurement Tools

We Would Like Your Help

What problems are you experiencing?

Have you used a good tool?

Give us the benefit of your experience: successful problem resolution!

Page 4: Revised PPT

22-Mar-2005 4Finding Network Problems: Measurement Tools

What Are The Problems? (1)

Packet loss

Jitter

Out-of-order packets (extreme jitter)

Duplicated packets

Excessive latency• Interactive applications•TCP’s control system

Page 5: Revised PPT

22-Mar-2005 5Finding Network Problems: Measurement Tools

For TCP

Eliminating loss is the goal

Non-congestive losses especially tricky

TCP: 100 Mbit Ethernet coast-to-coast:•Full size packets… need 10-6 Ploss [Mathis]

•Less than 1 loss every 83 secondshttp://www.psc.edu/~mathis/papers/JTechs200105/

GigE: 10-8, 1 loss every 497 seconds

Page 6: Revised PPT

22-Mar-2005 6Finding Network Problems: Measurement Tools

What Are The Problems? (2)

TCP: lack of buffer space•Forces protocol into stop-and-wait•Number one TCP-related performance problem.

•70ms * 1Gbps = 70*10^6 bits, or 8.4MB•70ms * 100Mbps = 855KB•Many stacks default to 64KB, or 7.4Mbps

Page 7: Revised PPT

22-Mar-2005 7Finding Network Problems: Measurement Tools

What Are The Problems? (3)

Video/Audio: lack of buffer space•Makes broadcast streams very sensitive to previous problems

Application behaviors•Stop-and-wait behavior; Can’t stream•Lack of robustness to network anomalies

Page 8: Revised PPT

22-Mar-2005 8Finding Network Problems: Measurement Tools

The Usual Suspects

Host configuration errors (TCP buffers)

Duplex mismatch (Ethernet)

Wiring/Fiber problem

Bad equipment

Bad routing

Congestion• “Real” traffic• Unnecessary traffic (broadcasts, multicast, denial of service attacks)

Page 9: Revised PPT

22-Mar-2005 9Finding Network Problems: Measurement Tools

JPL/Caltech – GSFC

The situation•Using Abilene•Tuned hosts•Things work locally

Therefore it MUST be Abilene•Tests show good flows router-router• Intermediate tests point towards CA

Bad fiber connection!

Page 10: Revised PPT

22-Mar-2005 10Finding Network Problems: Measurement Tools

Strategy

Most problems are local…

Test ahead of time!

Is there connectivity & reasonable latency? (ping -> OWAMP)

Is routing reasonable (traceroute)

Is host reasonable (NDT; Web100)

Is path reasonable (iperf -> BWCTL)

Page 11: Revised PPT

22-Mar-2005 11Finding Network Problems: Measurement Tools

One Technique: ProblemIsolation via Divide and Conquer

Page 12: Revised PPT

22-Mar-2005 12Finding Network Problems: Measurement Tools

Outline

Problems, typical causes, diagnostic strategies

Tools: First mile, host issues

Tools: Path issues

Tools: Others to be aware of

Tools within Abilene

End-to-End Measurement Infrastructure

Page 13: Revised PPT

22-Mar-2005 13Finding Network Problems: Measurement Tools

Internet2 Detective

A simple “is there any hope” tool•Windows “tray” application•Red/green lights, am I on Internet2•Multicast available• IPv6 available

http://detective.internet2.edu/

Page 14: Revised PPT

22-Mar-2005 14Finding Network Problems: Measurement Tools

NLANR Performance Advisor

Geared for the naive user

Run at both ends, and see if a standard problem is detected.

Can also work with intermediate servers

http://dast.nlanr.net/Projects/Advisor

Page 15: Revised PPT

22-Mar-2005 15Finding Network Problems: Measurement Tools

NDT

Network Debugging Tool

Java applet

Connects to server in middle, runs tests, and evaluates heuristics looking for host and first mile problems.

Has detailed output.

You’ll see lots of detail later today.

A commercial tool that tests for TCP buffer problems: http://www.dslreports.com/tweaks/

Page 16: Revised PPT

22-Mar-2005 16Finding Network Problems: Measurement Tools

Host/OS Tuning: Web100

Goal: TCP stack, tuning not bottleneck

Large measurement component•TCP performance not what you expect?Ask TCP why!

–Receiver bottleneck (out of receiver window)–Sender bottleneck (no data to send)–Path bottleneck (out of congestion window)–Path anomalies (duplicate, out of order, loss)

www.web100.org

Page 17: Revised PPT

22-Mar-2005 17Finding Network Problems: Measurement Tools

Reference Servers (Beacons)

H.323 conferencing•Goal: portable machines that tell you if system likely to work (and if not, why?)

•Moderate-rate UDP of interest•E.g., H.323 Beaconhttp://www.osc.edu/oarnet/itecohio.net/beacon/

•ViDeNet Scout, http://scout.video.unc.edu/

Page 18: Revised PPT

22-Mar-2005 18Finding Network Problems: Measurement Tools

Outline

Problems, typical causes, diagnostic strategies

Tools: First mile, host issues

Tools: Path issues

Tools: Others to be aware of

Tools within Abilene

End-to-End Measurement Infrastructure

Page 19: Revised PPT

22-Mar-2005 19Finding Network Problems: Measurement Tools

OWAMP – Latency/Loss

One-Way Active Measurement Protocol

Requires NTP-Synchronized clocks

Look for one-way latency, loss

Authentication and Scheduling

Again, lots more later today

Page 20: Revised PPT

22-Mar-2005 20Finding Network Problems: Measurement Tools

BWCTL -- Throughput

A tool for throughput testing that includes scheduling and authentication.

Currently uses iperf for actual tests.

Can assign users (or IP addresses) to classes, give classes different throughput limits or time limits.

Periodic and on-demand testing.

Lots more later today.

Page 21: Revised PPT

22-Mar-2005 21Finding Network Problems: Measurement Tools

Outline

Problems, typical causes, diagnostic strategies

Tools: First mile, host issues

Tools: Path issues

Tools: Others to be aware of

Tools within Abilene

End-to-End Measurement Infrastructure

Page 22: Revised PPT

22-Mar-2005 22Finding Network Problems: Measurement Tools

Some Commercial Tools

Caveat: only a partial list, give me more!

Spirent (nee Netcom/Adtech): • SmartBits: test at low & high rates, QoS; test components or end-to-end path

NetIQ: Chariot/Pegasus

Agilent (like SmartBits, and FireHunter)

Ixia (like SmartBits/Spirent)

Brix Networks (like AMP/Owamp, for ‘QoS’)

Apparent Networks: path debugger

Page 23: Revised PPT

22-Mar-2005 23Finding Network Problems: Measurement Tools

Some Noncommercial Tools

Iperf: dast.nlanr.net/Projects/iperf• See also http://www-itg.lbl.gov/nettest/ • http://www-didc.lbl.gov/NCS/

Flowscan: • http://www.caida.org/tools/utilities/flowscan/ • http://net.doit.wisc.edu/~plonka/FlowScan/

SLAC’s traceroute perl script:• http://www.slac.stanford.edu/comp/net/wan-mon/traceroute-srv.html

One large list: • http://www.slac.stanford.edu/xorg/nmtf/nmtf-tools.html

Page 24: Revised PPT

22-Mar-2005 24Finding Network Problems: Measurement Tools

Outline

Problems, typical causes, diagnostic strategies

Tools: First mile, host issues

Tools: Path issues

Tools: Others to be aware of

Tools within Abilene

End-to-End Measurement Infrastructure

Page 25: Revised PPT

22-Mar-2005 25Finding Network Problems: Measurement Tools

Abilene:Measurements from the Center

Active (latency, throughput)• Measurement within Abilene• Measurements to the edge

Passive• SNMP stats (esp. core Abilene links)• Variables via router proxy• Router configuration• Route state• Characterization of traffic

–Netflow; OCxMON

Page 26: Revised PPT

22-Mar-2005 26Finding Network Problems: Measurement Tools

Goal

Abilene goal to be an exemplar•Measurements open•Tests possible to router nodes•Throughput tests routinely through backbone

•…as well as existing utilization, etc.•The “Abilene Observatory”http://abilene.internet2.edu/observatory

Page 27: Revised PPT

22-Mar-2005 27Finding Network Problems: Measurement Tools

Abilene: Machines

GigE connected high-performance tester•bwctl, “nms1”, 9000 byte MTU

Latency tester•owamp, “nms4”, 100bT

Stats collection•SNMP, flow-stats, “nms3”, 100bT

Ad-hoc tests•NDT server, “nms2”, gigE, 1500 byte MTU

Page 28: Revised PPT

22-Mar-2005 28Finding Network Problems: Measurement Tools

Throughput

Take tests 1/hr, 20 seconds each• IPv4 TCP• IPv6 TCP (no discernable difference)• IPv4 UDP (on our platforms flakey at 1G)• IPv6 UDP (ditto)

Others test to our nodes

Others test amongst themselves

Net result: 25% of traffic (NOT capacity) is measurement

Page 29: Revised PPT

22-Mar-2005 29Finding Network Problems: Measurement Tools

Latency

CDMA used to synchronize NTP•www.endruntechnologies.com

Test among all router node pairs

10/sec

IPv4 and IPv6

Minimal sized packets

Poisson schedule

Page 30: Revised PPT

22-Mar-2005 30Finding Network Problems: Measurement Tools

Passive - Utilization

The Abilene NOC takes•Packets in,out•Bytes in,out•Drops/Errors• ..for all interfaces, publishes internal links & peering points (at 5 min intervals)

• ..via SNMP polling – every 60 sec

http://loadrunner.uits.iu.edu/weathermaps/abilene/abilene.html

Page 31: Revised PPT

22-Mar-2005 31Finding Network Problems: Measurement Tools

Page 32: Revised PPT

22-Mar-2005 32Finding Network Problems: Measurement Tools

Abilene Pointers

http://www.abilene.iu.edu/ •Monitoring•Tools

http://www.itec.oar.net/abilene-netflow

http://netflow.internet2.edu/weekly/ (summaries)

Page 33: Revised PPT

22-Mar-2005 33Finding Network Problems: Measurement Tools

Outline

Problems, typical causes, diagnostic strategies

Tools: First mile, host issues

Tools: Path issues

Tools: Others to be aware of

Tools within Abilene

End-to-End Measurement Infrastructure

Page 34: Revised PPT

22-Mar-2005 34Finding Network Problems: Measurement Tools

End-to-End Measurement Infrastructure Vision

Ongoing monitoring to test major elements, and end-to-end paths.

•Elements: gigaPoP links, peering, …•Utilization •Delay•Loss•Occasional throughput•Multicast connectivity

Page 35: Revised PPT

22-Mar-2005 35Finding Network Problems: Measurement Tools

End-to-End Measurement Infrastructure Vision II

Many more end to end paths than can be monitored.

Diagnostic tools available on-demand (with authorization)

•Show routes•Perform flow tests (perhaps app tests)•Parse/debug flows (a-la tcpdump or OCXmon with heuristic tools)

Page 36: Revised PPT

22-Mar-2005 36Finding Network Problems: Measurement Tools

What Campuses Can Do

Export SNMP data• I have an “Internet2 list”, can add you•Monitor loss as well as throughput

Performance test point at campus edge•Hopefully, the result of today’s workshop•Possibly also traceroute “looking glass”•Commercial (e.g., NetIQ) complements•We have a master list

Page 37: Revised PPT

22-Mar-2005 37Finding Network Problems: Measurement Tools

Strategy (references) (1)

See also•http://e2epi.internet2.edu/Look at stories, documents, tools

• http://e2epi.internet2.edu/ndt/Pointer to the tool, and using it for debugging the last mile

Page 38: Revised PPT

22-Mar-2005 38Finding Network Problems: Measurement Tools

Strategy (references) (2)

•http://www.psc.edu/networking/projects/tcptune/ How to tweak OS parameters (also scp pointer)

•http://www.ncne.org/research/tcp/ TCP debugging the detailed way

•http://dast.nlanr.net/Guides/WritingApps/ Tips for app writers

•http://dast.nlanr.net/Guides/GettingStartedAnd some checking to do by hand & debugging.

Page 39: Revised PPT

22-Mar-2005 39Finding Network Problems: Measurement Tools

Acknowledgements

The original presentation by Matt Zekauskas using ideas inspired by material from NLANR DAST, Matt Mathis, and others.

Copyright Internet2 2005, All Rights Reserved.

Your mileage may vary. Caveat Emptor. It’s a desert topping and a floor wax. They all do that. It’s a feature. It’s wafer-thin. Sleep is for the weak. Coffee won’t hurt you, look what it’s done for meeee…

Page 40: Revised PPT

www.internet2.edu