Finding Network Problems That Influence Applications

64
22 Mar 2005 v0.4 Finding Network Problems that Influence Applications: Measurement Tools Internet2 Performance Workshop

description

 

Transcript of Finding Network Problems That Influence Applications

Page 1: Finding Network Problems That Influence Applications

22 Mar 2005 v0.4

Finding Network Problems that Influence Applications:Measurement Tools

Internet2 Performance Workshop

Page 2: Finding Network Problems That Influence Applications

V0.4 22-Mar-2005 2Finding Network Problems: Measurement Tools

Outline

Problems, typical causes, diagnostic strategies

Examples showing usage of the tools we’ll be talking about today

End-to-End Measurement Infrastructure

Page 3: Finding Network Problems That Influence Applications

V0.4 22-Mar-2005 3Finding Network Problems: Measurement Tools

We Would Like Your Help

What problems are you experiencing?

Have you used a good tool?

Give us the benefit of your experience: successful problem resolution!

Page 4: Finding Network Problems That Influence Applications

V0.4 22-Mar-2005 4Finding Network Problems: Measurement Tools

What Are The Problems? (1)

Packet loss

Jitter

Out-of-order packets (extreme jitter)

Duplicated packets

Excessive latency• Interactive applications•TCP’s control system

Page 5: Finding Network Problems That Influence Applications

V0.4 22-Mar-2005 5Finding Network Problems: Measurement Tools

For TCP

Eliminating loss is the goal

Non-congestive losses especially tricky

TCP: 100 Mbit Ethernet coast-to-coast:•Full size packets… need 10-6 Ploss [Mathis]

•Less than 1 loss every 83 secondshttp://www.psc.edu/~mathis/papers/JTechs200105/

GigE: 10-8, 1 loss every 497 seconds

Page 6: Finding Network Problems That Influence Applications

V0.4 22-Mar-2005 6Finding Network Problems: Measurement Tools

What Are The Problems? (2)

TCP: lack of buffer space•Forces protocol into stop-and-wait•Number one TCP-related performance problem.

•70ms * 1Gbps = 70*10^6 bits, or 8.4MB•70ms * 100Mbps = 855KB•Many stacks default to 64KB, or 7.4Mbps

Page 7: Finding Network Problems That Influence Applications

V0.4 22-Mar-2005 7Finding Network Problems: Measurement Tools

What Are The Problems? (3)

Video/Audio: lack of buffer space•Makes broadcast streams very sensitive to previous problems

Application behaviors•Stop-and-wait behavior; Can’t stream•Lack of robustness to network anomalies

Page 8: Finding Network Problems That Influence Applications

V0.4 22-Mar-2005 8Finding Network Problems: Measurement Tools

The Usual Suspects

Host configuration errors (TCP buffers)

Duplex mismatch (Ethernet)

Wiring/Fiber problem

Bad equipment

Bad routing

Congestion• “Real” traffic• Unnecessary traffic (broadcasts, multicast, denial of service attacks)

Page 9: Finding Network Problems That Influence Applications

V0.4 22-Mar-2005 9Finding Network Problems: Measurement Tools

Strategy

Most problems are local…

Test ahead of time!

Is there connectivity & reasonable latency? (ping -> OWAMP)

Is routing reasonable (traceroute)

Is host reasonable (NDT; Web100)

Is path reasonable (iperf -> BWCTL)

Page 10: Finding Network Problems That Influence Applications

V0.4 22-Mar-2005 10Finding Network Problems: Measurement Tools

One Technique: ProblemIsolation via Divide and Conquer

Page 11: Finding Network Problems That Influence Applications

V0.4 22-Mar-2005 11Finding Network Problems: Measurement Tools

Outline

Problems, typical causes, diagnostic strategies

Examples showing usage of the tools we’ll be talking about today

End-to-End Measurement Infrastructure

Page 12: Finding Network Problems That Influence Applications

V0.4 22-Mar-2005 12Finding Network Problems: Measurement Tools

Tool Examples

When to use NDT

NDT in action at SC’04

When to use BWCTL

BWCTL in action with e-VLBI

When to use OWAMP

OWAMP in action with Abilene

Page 13: Finding Network Problems That Influence Applications

V0.4 22-Mar-2005 13Finding Network Problems: Measurement Tools

When to use NDT

When you want to know about last mile and host problems

When you want a quick and easy test to provide clues at possible problem cause

When you want to understand large segments of the path from the host view point

When a user wants to test their own host

Page 14: Finding Network Problems That Influence Applications

V0.4 22-Mar-2005 14Finding Network Problems: Measurement Tools

Technique

Start by testing to the nearest NDT server from each end of the problem path

This will help you with a majority of problems

If test both indicate good performance, test to a distant NDT server

If tests still indicate good performance, suspect a problem in the application, not the host or network.

Page 15: Finding Network Problems That Influence Applications

V0.4 22-Mar-2005 15Finding Network Problems: Measurement Tools

SC’04 Real Life Example

Booth having trouble getting application to run from Amsterdam to Pittsburgh

Tests between Amsterdam SGI and Pittsburgh PC showed throughput limited to < 20 Mbps

Assumption is: PC buffers too small

Question: How do we set WinXP send/receive buffer

Page 16: Finding Network Problems That Influence Applications

V0.4 22-Mar-2005 16Finding Network Problems: Measurement Tools

SC’04 Determine WinXP info

http://www.dslreports.com/drtcp

Page 17: Finding Network Problems That Influence Applications

V0.4 22-Mar-2005 17Finding Network Problems: Measurement Tools

SC’04 Confirm PC settings

DrTCP reported 16 MB buffers, but test program still slow, Q: How to confirm?

Run test to SCInet NDT server (PC has Fast Ethernet Connection)

• Client-to-Server: 90 Mbps• Server-to-Client: 95 Mbps• PC Send/Recv Buffer size: 16 Mbytes (wscale 8)• NDT Send/Recv Buffer Size: 8 Mbytes (wscale 7)• Reported TCP average RTT: 46.2 msec

– approximately 600 Kbytes of data in TCP buffer

• Min buffer size / RTT: 1.3 Gbps

Page 18: Finding Network Problems That Influence Applications

V0.4 22-Mar-2005 18Finding Network Problems: Measurement Tools

SC’04 Local PC Configured OK

No problem found

Able to run at line rate

Confirmed that PC’s TCP buffers were set correctly

Page 19: Finding Network Problems That Influence Applications

V0.4 22-Mar-2005 19Finding Network Problems: Measurement Tools

SC’04 Amsterdam SGI

Run test from remote SGI to SC show floor (SGI is

Gigabit Ethernet connected).

Downloaded and built command line tool on SGI IRIX

• Client-to-Server: 17 Mbps• Server-to-Client: 16 Mbps• SGI Send/Recv Buffer size: 256 Kbytes (wscale 3)• NDT Send/Recv Buffer Size: 8 Mbytes (wscale 7)• Average RTT: 106.7 msec• Min Buffer size / RTT: 19 Mbps

Page 20: Finding Network Problems That Influence Applications

V0.4 22-Mar-2005 20Finding Network Problems: Measurement Tools

SC’04 Amsterdam SGI (tuned)

Re-run test from remote SGI to SC show floor with –b # option.

•Client-to-Server: 107 Mbps•Server-to-Client: 109 Mbps•SGI Send/Recv Buffer size: 2 Mbytes (wscale 5)•NDT Send/Recv Buffer Size: 8 Mbytes (wscale 7)

•Reported average RTT: 104 msec•Min Buffer size / RTT: 153.8 Mbps

Page 21: Finding Network Problems That Influence Applications

V0.4 22-Mar-2005 21Finding Network Problems: Measurement Tools

SC’04 Debugging Results

Team spent over 1 hour looking at Win XP config, trying to verify Buffer size

• 2 tools used gave different results

Single NDT test verified this in under 30 seconds

10 minutes to download and install NDT client on SGI

15 minutes to discuss options and run client test with set buffer option

Page 22: Finding Network Problems That Influence Applications

V0.4 22-Mar-2005 22Finding Network Problems: Measurement Tools

SC’04 Debugging Results

8 Minutes to find SGI limits and determine maximum allowable buffer setting (2 MB)

Total time 34 minutes to verify problem was with remote servers’ TCP send/receive buffer size

Network path verified but Application still performed poorly until it was also tuned

Page 23: Finding Network Problems That Influence Applications

V0.4 22-Mar-2005 23Finding Network Problems: Measurement Tools

When to use BWCTL

You want to understand segments of the path

You want to know if each segment can handle flows of a specific size

You want to know parameters such as bandwidth, packet loss and latency

To help design or tune an application based on available performance

Page 24: Finding Network Problems That Influence Applications

V0.4 22-Mar-2005 24Finding Network Problems: Measurement Tools

Technique

Divide and Conquer!

Look for segments with performance less that required by the application

Page 25: Finding Network Problems That Influence Applications

V0.4 22-Mar-2005 25Finding Network Problems: Measurement Tools

e-VLBI Case Study

The e-VLBI project needed to move massive amounts of data between a number of sites around the world

They found that performance from some sites was only in the 1 Mbps range

They needed to understand why

Page 26: Finding Network Problems That Influence Applications

V0.4 22-Mar-2005 26Finding Network Problems: Measurement Tools

e-VBLI test infrastructure

David Lapsley, one of the research engineers, established BWCTL servers at the sites of the project.

•Japan: Kashima Observatory•Sweden: Onsala Observatory•US: Haystack (BOS)

He performed a full mesh of tests between all of the servers

Page 27: Finding Network Problems That Influence Applications

V0.4 22-Mar-2005 27Finding Network Problems: Measurement Tools

e-VLBI Results #1

They used Abilene nodes to divide the problem path

David found that there was considerable packet loss in the area of Haystack Observatory

Working with network folk from the area the problem was isolated and resolved

Page 28: Finding Network Problems That Influence Applications

V0.4 22-Mar-2005 28Finding Network Problems: Measurement Tools

e-VLBI Results #2

For one site that was using a commodity Internet only 1 Mbps was regularly seen

The application was changed to locate caching to reduce dependence on that site.

Page 29: Finding Network Problems That Influence Applications

V0.4 22-Mar-2005 29Finding Network Problems: Measurement Tools

e-VLBI Regular Testing

They found the testing to be very useful in understanding the network status

They established a regular testing schedule

They established a web site for reporting the results

All researchers can check the network statushttp://web.haystack.mit.edu/staff/dlapsley/tsev7.html

Page 30: Finding Network Problems That Influence Applications

V0.4 22-Mar-2005 30Finding Network Problems: Measurement Tools

When to use OWAMP

Want baseline “heartbeat” information

Asymmetric routes can make problem location more difficult

OWAMP can provide detailed performance on one direction in the path

When you want to know precise latency information

Good for helping real-time applications

Page 31: Finding Network Problems That Influence Applications

V0.4 22-Mar-2005 31Finding Network Problems: Measurement Tools

Why use OWAMP

It is very sensitive to minor network changes

•Route changes•Packet queuing

It tells you about one-direction of the path

Page 32: Finding Network Problems That Influence Applications

V0.4 22-Mar-2005 32Finding Network Problems: Measurement Tools

OWAMP Case Study Queuing on Abilene

Tuesday, 2004-08-17, 16:05-16:20 UTC

That’s 11:05 to 11:20 EDT

Caltech to CERN performing 10GE

throughput experiment

• Single adapter to date, PCI-X

• Theoretical limit of ~8.5 Gbps

• Practical limit closer to 7.5 Gbps

• Exactly what was tested at that time is unkown

“Worst 10” delay list had some larger than

normal variances… to date, software issues

Page 33: Finding Network Problems That Influence Applications

V0.4 22-Mar-2005 33Finding Network Problems: Measurement Tools

One Links History

The Denver to KSCY Link

Page 34: Finding Network Problems That Influence Applications

V0.4 22-Mar-2005 34Finding Network Problems: Measurement Tools

What It Shows

Only paths that traverse DNVR>KSCY showed additional delay

Some delayed by ~ an extra 35msec

Probable cause – Router started queuing packets create a small delay

It tells you that there is congestion on the link.

Page 35: Finding Network Problems That Influence Applications

V0.4 22-Mar-2005 35Finding Network Problems: Measurement Tools

Outline

Problems, typical causes, diagnostic strategies

Examples showing usage of the tools we’ll be talking about today

End-to-End Measurement Infrastructure

Page 36: Finding Network Problems That Influence Applications

V0.4 22-Mar-2005 36Finding Network Problems: Measurement Tools

End-to-End Measurement Infrastructure Vision

Ongoing monitoring to test major elements, and end-to-end paths.

•Elements: gigaPoP links, peering, …•Utilization •Delay•Loss•Occasional throughput•Multicast connectivity

Page 37: Finding Network Problems That Influence Applications

V0.4 22-Mar-2005 37Finding Network Problems: Measurement Tools

End-to-End Measurement Infrastructure Vision II

Many more end to end paths than can be monitored.

Diagnostic tools available on-demand (with authorization)

•Show routes•Perform flow tests (perhaps app tests)•Parse/debug flows (a-la tcpdump or OCXmon with heuristic tools)

Page 38: Finding Network Problems That Influence Applications

V0.4 22-Mar-2005 38Finding Network Problems: Measurement Tools

What Campuses Can Do

Export SNMP data• I have an “Internet2 list”, can add you•Monitor loss as well as throughput

Performance test point at campus edge•Hopefully, the result of today’s workshop•Possibly also traceroute “looking glass”•Commercial (e.g., NetIQ) complements•We have a master list

Page 39: Finding Network Problems That Influence Applications

V0.4 22-Mar-2005 39Finding Network Problems: Measurement Tools

Strategy (references) (1)

See also•http://e2epi.internet2.edu/Look at stories, documents, tools

• http://e2epi.internet2.edu/ndt/Pointer to the tool, and using it for debugging the last mile

Page 40: Finding Network Problems That Influence Applications

V0.4 22-Mar-2005 40Finding Network Problems: Measurement Tools

Strategy (references) (2)

•http://www.psc.edu/networking/projects/tcptune/ How to tweak OS parameters (also scp pointer)

•http://www.ncne.org/research/tcp/ TCP debugging the detailed way

•http://dast.nlanr.net/Guides/WritingApps/ Tips for app writers

•http://dast.nlanr.net/Guides/GettingStartedAnd some checking to do by hand & debugging.

Page 41: Finding Network Problems That Influence Applications

www.internet2.edu

Page 42: Finding Network Problems That Influence Applications

V0.4 22-Mar-2005 42Finding Network Problems: Measurement Tools

Acknowledgements

The original presentation by Matt Zekauskas using ideas inspired by material from NLANR DAST, Matt Mathis, and others.

Copyright Internet2 2005, All Rights Reserved.

Page 43: Finding Network Problems That Influence Applications

22 Mar 2005 v0.4

Background:Detailed Tools Discussion

Page 44: Finding Network Problems That Influence Applications

V0.4 22-Mar-2005 44Finding Network Problems: Measurement Tools

Bakground: Tools Outline

Tools: First mile, host issues

Tools: Path issues

Tools: Others to be aware of

Tools within Abilene

Page 45: Finding Network Problems That Influence Applications

V0.4 22-Mar-2005 45Finding Network Problems: Measurement Tools

Internet2 Detective

A simple “is there any hope” tool•Windows “tray” application•Red/green lights, am I on Internet2•Multicast available• IPv6 available

http://detective.internet2.edu/

Page 46: Finding Network Problems That Influence Applications

V0.4 22-Mar-2005 46Finding Network Problems: Measurement Tools

NLANR Performance Advisor

Geared for the naive user

Run at both ends, and see if a standard problem is detected.

Can also work with intermediate servers

http://dast.nlanr.net/Projects/Advisor

Page 47: Finding Network Problems That Influence Applications

V0.4 22-Mar-2005 47Finding Network Problems: Measurement Tools

NDT

Network Debugging Tool

Java applet

Connects to server in middle, runs tests, and evaluates heuristics looking for host and first mile problems.

Has detailed output.

You’ll see lots of detail later today.

A commercial tool that tests for TCP buffer problems: http://www.dslreports.com/tweaks/

Page 48: Finding Network Problems That Influence Applications

V0.4 22-Mar-2005 48Finding Network Problems: Measurement Tools

Host/OS Tuning: Web100

Goal: TCP stack, tuning not bottleneck

Large measurement component•TCP performance not what you expect?Ask TCP why!

–Receiver bottleneck (out of receiver window)–Sender bottleneck (no data to send)–Path bottleneck (out of congestion window)–Path anomalies (duplicate, out of order, loss)

www.web100.org

Page 49: Finding Network Problems That Influence Applications

V0.4 22-Mar-2005 49Finding Network Problems: Measurement Tools

Reference Servers (Beacons)

H.323 conferencing•Goal: portable machines that tell you if system likely to work (and if not, why?)

•Moderate-rate UDP of interest•E.g., H.323 Beaconhttp://www.osc.edu/oarnet/itecohio.net/beacon/

•ViDeNet Scout, http://scout.video.unc.edu/

Page 50: Finding Network Problems That Influence Applications

V0.4 22-Mar-2005 50Finding Network Problems: Measurement Tools

Background: Tools Outline

Tools: First mile, host issues

Tools: Path issues

Tools: Others to be aware of

Tools within Abilene

Page 51: Finding Network Problems That Influence Applications

V0.4 22-Mar-2005 51Finding Network Problems: Measurement Tools

OWAMP – Latency/Loss

One-Way Active Measurement Protocol

Requires NTP-Synchronized clocks

Look for one-way latency, loss

Authentication and Scheduling

Again, lots more later today

Page 52: Finding Network Problems That Influence Applications

V0.4 22-Mar-2005 52Finding Network Problems: Measurement Tools

BWCTL -- Throughput

A tool for throughput testing that includes scheduling and authentication.

Currently uses iperf for actual tests.

Can assign users (or IP addresses) to classes, give classes different throughput limits or time limits.

Periodic and on-demand testing.

Lots more later today.

Page 53: Finding Network Problems That Influence Applications

V0.4 22-Mar-2005 53Finding Network Problems: Measurement Tools

Background: Tools Outline

Tools: First mile, host issues

Tools: Path issues

Tools: Others to be aware of

Tools within Abilene

Page 54: Finding Network Problems That Influence Applications

V0.4 22-Mar-2005 54Finding Network Problems: Measurement Tools

Some Commercial Tools

Caveat: only a partial list, give me more!

Spirent (nee Netcom/Adtech): • SmartBits: test at low & high rates, QoS; test components or end-to-end path

NetIQ: Chariot/Pegasus

Agilent (like SmartBits, and FireHunter)

Ixia (like SmartBits/Spirent)

Brix Networks (like AMP/Owamp, for ‘QoS’)

Apparent Networks: path debugger

Page 55: Finding Network Problems That Influence Applications

V0.4 22-Mar-2005 55Finding Network Problems: Measurement Tools

Some Noncommercial Tools

Iperf: dast.nlanr.net/Projects/iperf• See also http://www-itg.lbl.gov/nettest/ • http://www-didc.lbl.gov/NCS/

Flowscan: • http://www.caida.org/tools/utilities/flowscan/ • http://net.doit.wisc.edu/~plonka/FlowScan/

SLAC’s traceroute perl script:• http://www.slac.stanford.edu/comp/net/wan-mon/traceroute-srv.html

One large list: • http://www.slac.stanford.edu/xorg/nmtf/nmtf-tools.html

Page 56: Finding Network Problems That Influence Applications

V0.4 22-Mar-2005 56Finding Network Problems: Measurement Tools

Background: Tools Outline

Tools: First mile, host issues

Tools: Path issues

Tools: Others to be aware of

Tools within Abilene

Page 57: Finding Network Problems That Influence Applications

V0.4 22-Mar-2005 57Finding Network Problems: Measurement Tools

Abilene:Measurements from the Center

Active (latency, throughput)• Measurement within Abilene• Measurements to the edge

Passive• SNMP stats (esp. core Abilene links)• Variables via router proxy• Router configuration• Route state• Characterization of traffic

–Netflow; OCxMON

Page 58: Finding Network Problems That Influence Applications

V0.4 22-Mar-2005 58Finding Network Problems: Measurement Tools

Goal

Abilene goal to be an exemplar•Measurements open•Tests possible to router nodes•Throughput tests routinely through backbone

•…as well as existing utilization, etc.•The “Abilene Observatory”http://abilene.internet2.edu/observatory

Page 59: Finding Network Problems That Influence Applications

V0.4 22-Mar-2005 59Finding Network Problems: Measurement Tools

Abilene: Machines

GigE connected high-performance tester•bwctl, “nms1”, 9000 byte MTU

Latency tester•owamp, “nms4”, 100bT

Stats collection•SNMP, flow-stats, “nms3”, 100bT

Ad-hoc tests•NDT server, “nms2”, gigE, 1500 byte MTU

Page 60: Finding Network Problems That Influence Applications

V0.4 22-Mar-2005 60Finding Network Problems: Measurement Tools

Throughput

Take tests 1/hr, 20 seconds each• IPv4 TCP• IPv6 TCP (no discernable difference)• IPv4 UDP (on our platforms flakey at 1G)• IPv6 UDP (ditto)

Others test to our nodes

Others test amongst themselves

Net result: 25% of traffic (NOT capacity) is measurement

Page 61: Finding Network Problems That Influence Applications

V0.4 22-Mar-2005 61Finding Network Problems: Measurement Tools

Latency

CDMA used to synchronize NTP•www.endruntechnologies.com

Test among all router node pairs

10/sec

IPv4 and IPv6

Minimal sized packets

Poisson schedule

Page 62: Finding Network Problems That Influence Applications

V0.4 22-Mar-2005 62Finding Network Problems: Measurement Tools

Passive - Utilization

The Abilene NOC takes•Packets in,out•Bytes in,out•Drops/Errors• ..for all interfaces, publishes internal links & peering points (at 5 min intervals)

• ..via SNMP polling – every 60 sec

http://loadrunner.uits.iu.edu/weathermaps/abilene/abilene.html

Page 63: Finding Network Problems That Influence Applications

V0.4 22-Mar-2005 63Finding Network Problems: Measurement Tools

Page 64: Finding Network Problems That Influence Applications

V0.4 22-Mar-2005 64Finding Network Problems: Measurement Tools

Abilene Pointers

http://www.abilene.iu.edu/ •Monitoring•Tools

http://www.itec.oar.net/abilene-netflow

http://netflow.internet2.edu/weekly/ (summaries)