Finding Network Problems That Influence Applications
-
Upload
nirmala-last -
Category
Technology
-
view
676 -
download
1
description
Transcript of Finding Network Problems That Influence Applications
22 Mar 2005 v0.4
Finding Network Problems that Influence Applications:Measurement Tools
Internet2 Performance Workshop
V0.4 22-Mar-2005 2Finding Network Problems: Measurement Tools
Outline
Problems, typical causes, diagnostic strategies
Examples showing usage of the tools we’ll be talking about today
End-to-End Measurement Infrastructure
V0.4 22-Mar-2005 3Finding Network Problems: Measurement Tools
We Would Like Your Help
What problems are you experiencing?
Have you used a good tool?
Give us the benefit of your experience: successful problem resolution!
V0.4 22-Mar-2005 4Finding Network Problems: Measurement Tools
What Are The Problems? (1)
Packet loss
Jitter
Out-of-order packets (extreme jitter)
Duplicated packets
Excessive latency• Interactive applications•TCP’s control system
V0.4 22-Mar-2005 5Finding Network Problems: Measurement Tools
For TCP
Eliminating loss is the goal
Non-congestive losses especially tricky
TCP: 100 Mbit Ethernet coast-to-coast:•Full size packets… need 10-6 Ploss [Mathis]
•Less than 1 loss every 83 secondshttp://www.psc.edu/~mathis/papers/JTechs200105/
GigE: 10-8, 1 loss every 497 seconds
V0.4 22-Mar-2005 6Finding Network Problems: Measurement Tools
What Are The Problems? (2)
TCP: lack of buffer space•Forces protocol into stop-and-wait•Number one TCP-related performance problem.
•70ms * 1Gbps = 70*10^6 bits, or 8.4MB•70ms * 100Mbps = 855KB•Many stacks default to 64KB, or 7.4Mbps
V0.4 22-Mar-2005 7Finding Network Problems: Measurement Tools
What Are The Problems? (3)
Video/Audio: lack of buffer space•Makes broadcast streams very sensitive to previous problems
Application behaviors•Stop-and-wait behavior; Can’t stream•Lack of robustness to network anomalies
V0.4 22-Mar-2005 8Finding Network Problems: Measurement Tools
The Usual Suspects
Host configuration errors (TCP buffers)
Duplex mismatch (Ethernet)
Wiring/Fiber problem
Bad equipment
Bad routing
Congestion• “Real” traffic• Unnecessary traffic (broadcasts, multicast, denial of service attacks)
V0.4 22-Mar-2005 9Finding Network Problems: Measurement Tools
Strategy
Most problems are local…
Test ahead of time!
Is there connectivity & reasonable latency? (ping -> OWAMP)
Is routing reasonable (traceroute)
Is host reasonable (NDT; Web100)
Is path reasonable (iperf -> BWCTL)
V0.4 22-Mar-2005 10Finding Network Problems: Measurement Tools
One Technique: ProblemIsolation via Divide and Conquer
V0.4 22-Mar-2005 11Finding Network Problems: Measurement Tools
Outline
Problems, typical causes, diagnostic strategies
Examples showing usage of the tools we’ll be talking about today
End-to-End Measurement Infrastructure
V0.4 22-Mar-2005 12Finding Network Problems: Measurement Tools
Tool Examples
When to use NDT
NDT in action at SC’04
When to use BWCTL
BWCTL in action with e-VLBI
When to use OWAMP
OWAMP in action with Abilene
V0.4 22-Mar-2005 13Finding Network Problems: Measurement Tools
When to use NDT
When you want to know about last mile and host problems
When you want a quick and easy test to provide clues at possible problem cause
When you want to understand large segments of the path from the host view point
When a user wants to test their own host
V0.4 22-Mar-2005 14Finding Network Problems: Measurement Tools
Technique
Start by testing to the nearest NDT server from each end of the problem path
This will help you with a majority of problems
If test both indicate good performance, test to a distant NDT server
If tests still indicate good performance, suspect a problem in the application, not the host or network.
V0.4 22-Mar-2005 15Finding Network Problems: Measurement Tools
SC’04 Real Life Example
Booth having trouble getting application to run from Amsterdam to Pittsburgh
Tests between Amsterdam SGI and Pittsburgh PC showed throughput limited to < 20 Mbps
Assumption is: PC buffers too small
Question: How do we set WinXP send/receive buffer
V0.4 22-Mar-2005 16Finding Network Problems: Measurement Tools
SC’04 Determine WinXP info
http://www.dslreports.com/drtcp
V0.4 22-Mar-2005 17Finding Network Problems: Measurement Tools
SC’04 Confirm PC settings
DrTCP reported 16 MB buffers, but test program still slow, Q: How to confirm?
Run test to SCInet NDT server (PC has Fast Ethernet Connection)
• Client-to-Server: 90 Mbps• Server-to-Client: 95 Mbps• PC Send/Recv Buffer size: 16 Mbytes (wscale 8)• NDT Send/Recv Buffer Size: 8 Mbytes (wscale 7)• Reported TCP average RTT: 46.2 msec
– approximately 600 Kbytes of data in TCP buffer
• Min buffer size / RTT: 1.3 Gbps
V0.4 22-Mar-2005 18Finding Network Problems: Measurement Tools
SC’04 Local PC Configured OK
No problem found
Able to run at line rate
Confirmed that PC’s TCP buffers were set correctly
V0.4 22-Mar-2005 19Finding Network Problems: Measurement Tools
SC’04 Amsterdam SGI
Run test from remote SGI to SC show floor (SGI is
Gigabit Ethernet connected).
Downloaded and built command line tool on SGI IRIX
• Client-to-Server: 17 Mbps• Server-to-Client: 16 Mbps• SGI Send/Recv Buffer size: 256 Kbytes (wscale 3)• NDT Send/Recv Buffer Size: 8 Mbytes (wscale 7)• Average RTT: 106.7 msec• Min Buffer size / RTT: 19 Mbps
V0.4 22-Mar-2005 20Finding Network Problems: Measurement Tools
SC’04 Amsterdam SGI (tuned)
Re-run test from remote SGI to SC show floor with –b # option.
•Client-to-Server: 107 Mbps•Server-to-Client: 109 Mbps•SGI Send/Recv Buffer size: 2 Mbytes (wscale 5)•NDT Send/Recv Buffer Size: 8 Mbytes (wscale 7)
•Reported average RTT: 104 msec•Min Buffer size / RTT: 153.8 Mbps
V0.4 22-Mar-2005 21Finding Network Problems: Measurement Tools
SC’04 Debugging Results
Team spent over 1 hour looking at Win XP config, trying to verify Buffer size
• 2 tools used gave different results
Single NDT test verified this in under 30 seconds
10 minutes to download and install NDT client on SGI
15 minutes to discuss options and run client test with set buffer option
V0.4 22-Mar-2005 22Finding Network Problems: Measurement Tools
SC’04 Debugging Results
8 Minutes to find SGI limits and determine maximum allowable buffer setting (2 MB)
Total time 34 minutes to verify problem was with remote servers’ TCP send/receive buffer size
Network path verified but Application still performed poorly until it was also tuned
V0.4 22-Mar-2005 23Finding Network Problems: Measurement Tools
When to use BWCTL
You want to understand segments of the path
You want to know if each segment can handle flows of a specific size
You want to know parameters such as bandwidth, packet loss and latency
To help design or tune an application based on available performance
V0.4 22-Mar-2005 24Finding Network Problems: Measurement Tools
Technique
Divide and Conquer!
Look for segments with performance less that required by the application
V0.4 22-Mar-2005 25Finding Network Problems: Measurement Tools
e-VLBI Case Study
The e-VLBI project needed to move massive amounts of data between a number of sites around the world
They found that performance from some sites was only in the 1 Mbps range
They needed to understand why
V0.4 22-Mar-2005 26Finding Network Problems: Measurement Tools
e-VBLI test infrastructure
David Lapsley, one of the research engineers, established BWCTL servers at the sites of the project.
•Japan: Kashima Observatory•Sweden: Onsala Observatory•US: Haystack (BOS)
He performed a full mesh of tests between all of the servers
V0.4 22-Mar-2005 27Finding Network Problems: Measurement Tools
e-VLBI Results #1
They used Abilene nodes to divide the problem path
David found that there was considerable packet loss in the area of Haystack Observatory
Working with network folk from the area the problem was isolated and resolved
V0.4 22-Mar-2005 28Finding Network Problems: Measurement Tools
e-VLBI Results #2
For one site that was using a commodity Internet only 1 Mbps was regularly seen
The application was changed to locate caching to reduce dependence on that site.
V0.4 22-Mar-2005 29Finding Network Problems: Measurement Tools
e-VLBI Regular Testing
They found the testing to be very useful in understanding the network status
They established a regular testing schedule
They established a web site for reporting the results
All researchers can check the network statushttp://web.haystack.mit.edu/staff/dlapsley/tsev7.html
V0.4 22-Mar-2005 30Finding Network Problems: Measurement Tools
When to use OWAMP
Want baseline “heartbeat” information
Asymmetric routes can make problem location more difficult
OWAMP can provide detailed performance on one direction in the path
When you want to know precise latency information
Good for helping real-time applications
V0.4 22-Mar-2005 31Finding Network Problems: Measurement Tools
Why use OWAMP
It is very sensitive to minor network changes
•Route changes•Packet queuing
It tells you about one-direction of the path
V0.4 22-Mar-2005 32Finding Network Problems: Measurement Tools
OWAMP Case Study Queuing on Abilene
Tuesday, 2004-08-17, 16:05-16:20 UTC
That’s 11:05 to 11:20 EDT
Caltech to CERN performing 10GE
throughput experiment
• Single adapter to date, PCI-X
• Theoretical limit of ~8.5 Gbps
• Practical limit closer to 7.5 Gbps
• Exactly what was tested at that time is unkown
“Worst 10” delay list had some larger than
normal variances… to date, software issues
V0.4 22-Mar-2005 33Finding Network Problems: Measurement Tools
One Links History
The Denver to KSCY Link
V0.4 22-Mar-2005 34Finding Network Problems: Measurement Tools
What It Shows
Only paths that traverse DNVR>KSCY showed additional delay
Some delayed by ~ an extra 35msec
Probable cause – Router started queuing packets create a small delay
It tells you that there is congestion on the link.
V0.4 22-Mar-2005 35Finding Network Problems: Measurement Tools
Outline
Problems, typical causes, diagnostic strategies
Examples showing usage of the tools we’ll be talking about today
End-to-End Measurement Infrastructure
V0.4 22-Mar-2005 36Finding Network Problems: Measurement Tools
End-to-End Measurement Infrastructure Vision
Ongoing monitoring to test major elements, and end-to-end paths.
•Elements: gigaPoP links, peering, …•Utilization •Delay•Loss•Occasional throughput•Multicast connectivity
V0.4 22-Mar-2005 37Finding Network Problems: Measurement Tools
End-to-End Measurement Infrastructure Vision II
Many more end to end paths than can be monitored.
Diagnostic tools available on-demand (with authorization)
•Show routes•Perform flow tests (perhaps app tests)•Parse/debug flows (a-la tcpdump or OCXmon with heuristic tools)
V0.4 22-Mar-2005 38Finding Network Problems: Measurement Tools
What Campuses Can Do
Export SNMP data• I have an “Internet2 list”, can add you•Monitor loss as well as throughput
Performance test point at campus edge•Hopefully, the result of today’s workshop•Possibly also traceroute “looking glass”•Commercial (e.g., NetIQ) complements•We have a master list
V0.4 22-Mar-2005 39Finding Network Problems: Measurement Tools
Strategy (references) (1)
See also•http://e2epi.internet2.edu/Look at stories, documents, tools
• http://e2epi.internet2.edu/ndt/Pointer to the tool, and using it for debugging the last mile
V0.4 22-Mar-2005 40Finding Network Problems: Measurement Tools
Strategy (references) (2)
•http://www.psc.edu/networking/projects/tcptune/ How to tweak OS parameters (also scp pointer)
•http://www.ncne.org/research/tcp/ TCP debugging the detailed way
•http://dast.nlanr.net/Guides/WritingApps/ Tips for app writers
•http://dast.nlanr.net/Guides/GettingStartedAnd some checking to do by hand & debugging.
www.internet2.edu
V0.4 22-Mar-2005 42Finding Network Problems: Measurement Tools
Acknowledgements
The original presentation by Matt Zekauskas using ideas inspired by material from NLANR DAST, Matt Mathis, and others.
Copyright Internet2 2005, All Rights Reserved.
22 Mar 2005 v0.4
Background:Detailed Tools Discussion
V0.4 22-Mar-2005 44Finding Network Problems: Measurement Tools
Bakground: Tools Outline
Tools: First mile, host issues
Tools: Path issues
Tools: Others to be aware of
Tools within Abilene
V0.4 22-Mar-2005 45Finding Network Problems: Measurement Tools
Internet2 Detective
A simple “is there any hope” tool•Windows “tray” application•Red/green lights, am I on Internet2•Multicast available• IPv6 available
http://detective.internet2.edu/
V0.4 22-Mar-2005 46Finding Network Problems: Measurement Tools
NLANR Performance Advisor
Geared for the naive user
Run at both ends, and see if a standard problem is detected.
Can also work with intermediate servers
http://dast.nlanr.net/Projects/Advisor
V0.4 22-Mar-2005 47Finding Network Problems: Measurement Tools
NDT
Network Debugging Tool
Java applet
Connects to server in middle, runs tests, and evaluates heuristics looking for host and first mile problems.
Has detailed output.
You’ll see lots of detail later today.
A commercial tool that tests for TCP buffer problems: http://www.dslreports.com/tweaks/
V0.4 22-Mar-2005 48Finding Network Problems: Measurement Tools
Host/OS Tuning: Web100
Goal: TCP stack, tuning not bottleneck
Large measurement component•TCP performance not what you expect?Ask TCP why!
–Receiver bottleneck (out of receiver window)–Sender bottleneck (no data to send)–Path bottleneck (out of congestion window)–Path anomalies (duplicate, out of order, loss)
www.web100.org
V0.4 22-Mar-2005 49Finding Network Problems: Measurement Tools
Reference Servers (Beacons)
H.323 conferencing•Goal: portable machines that tell you if system likely to work (and if not, why?)
•Moderate-rate UDP of interest•E.g., H.323 Beaconhttp://www.osc.edu/oarnet/itecohio.net/beacon/
•ViDeNet Scout, http://scout.video.unc.edu/
V0.4 22-Mar-2005 50Finding Network Problems: Measurement Tools
Background: Tools Outline
Tools: First mile, host issues
Tools: Path issues
Tools: Others to be aware of
Tools within Abilene
V0.4 22-Mar-2005 51Finding Network Problems: Measurement Tools
OWAMP – Latency/Loss
One-Way Active Measurement Protocol
Requires NTP-Synchronized clocks
Look for one-way latency, loss
Authentication and Scheduling
Again, lots more later today
V0.4 22-Mar-2005 52Finding Network Problems: Measurement Tools
BWCTL -- Throughput
A tool for throughput testing that includes scheduling and authentication.
Currently uses iperf for actual tests.
Can assign users (or IP addresses) to classes, give classes different throughput limits or time limits.
Periodic and on-demand testing.
Lots more later today.
V0.4 22-Mar-2005 53Finding Network Problems: Measurement Tools
Background: Tools Outline
Tools: First mile, host issues
Tools: Path issues
Tools: Others to be aware of
Tools within Abilene
V0.4 22-Mar-2005 54Finding Network Problems: Measurement Tools
Some Commercial Tools
Caveat: only a partial list, give me more!
Spirent (nee Netcom/Adtech): • SmartBits: test at low & high rates, QoS; test components or end-to-end path
NetIQ: Chariot/Pegasus
Agilent (like SmartBits, and FireHunter)
Ixia (like SmartBits/Spirent)
Brix Networks (like AMP/Owamp, for ‘QoS’)
Apparent Networks: path debugger
V0.4 22-Mar-2005 55Finding Network Problems: Measurement Tools
Some Noncommercial Tools
Iperf: dast.nlanr.net/Projects/iperf• See also http://www-itg.lbl.gov/nettest/ • http://www-didc.lbl.gov/NCS/
Flowscan: • http://www.caida.org/tools/utilities/flowscan/ • http://net.doit.wisc.edu/~plonka/FlowScan/
SLAC’s traceroute perl script:• http://www.slac.stanford.edu/comp/net/wan-mon/traceroute-srv.html
One large list: • http://www.slac.stanford.edu/xorg/nmtf/nmtf-tools.html
V0.4 22-Mar-2005 56Finding Network Problems: Measurement Tools
Background: Tools Outline
Tools: First mile, host issues
Tools: Path issues
Tools: Others to be aware of
Tools within Abilene
V0.4 22-Mar-2005 57Finding Network Problems: Measurement Tools
Abilene:Measurements from the Center
Active (latency, throughput)• Measurement within Abilene• Measurements to the edge
Passive• SNMP stats (esp. core Abilene links)• Variables via router proxy• Router configuration• Route state• Characterization of traffic
–Netflow; OCxMON
V0.4 22-Mar-2005 58Finding Network Problems: Measurement Tools
Goal
Abilene goal to be an exemplar•Measurements open•Tests possible to router nodes•Throughput tests routinely through backbone
•…as well as existing utilization, etc.•The “Abilene Observatory”http://abilene.internet2.edu/observatory
V0.4 22-Mar-2005 59Finding Network Problems: Measurement Tools
Abilene: Machines
GigE connected high-performance tester•bwctl, “nms1”, 9000 byte MTU
Latency tester•owamp, “nms4”, 100bT
Stats collection•SNMP, flow-stats, “nms3”, 100bT
Ad-hoc tests•NDT server, “nms2”, gigE, 1500 byte MTU
V0.4 22-Mar-2005 60Finding Network Problems: Measurement Tools
Throughput
Take tests 1/hr, 20 seconds each• IPv4 TCP• IPv6 TCP (no discernable difference)• IPv4 UDP (on our platforms flakey at 1G)• IPv6 UDP (ditto)
Others test to our nodes
Others test amongst themselves
Net result: 25% of traffic (NOT capacity) is measurement
V0.4 22-Mar-2005 61Finding Network Problems: Measurement Tools
Latency
CDMA used to synchronize NTP•www.endruntechnologies.com
Test among all router node pairs
10/sec
IPv4 and IPv6
Minimal sized packets
Poisson schedule
V0.4 22-Mar-2005 62Finding Network Problems: Measurement Tools
Passive - Utilization
The Abilene NOC takes•Packets in,out•Bytes in,out•Drops/Errors• ..for all interfaces, publishes internal links & peering points (at 5 min intervals)
• ..via SNMP polling – every 60 sec
http://loadrunner.uits.iu.edu/weathermaps/abilene/abilene.html
V0.4 22-Mar-2005 63Finding Network Problems: Measurement Tools
V0.4 22-Mar-2005 64Finding Network Problems: Measurement Tools
Abilene Pointers
http://www.abilene.iu.edu/ •Monitoring•Tools
http://www.itec.oar.net/abilene-netflow
http://netflow.internet2.edu/weekly/ (summaries)