Rich Carlson

108
The Performance Bottleneck Application, Computer, or Network Richard Carlson Internet2 Part 1

description

sc congestion

Transcript of Rich Carlson

Page 1: Rich Carlson

The Performance BottleneckApplication, Computer, or Network

The Performance BottleneckApplication, Computer, or Network

Richard Carlson

Internet2

Part 1

Richard Carlson

Internet2

Part 1

Page 2: Rich Carlson

OutlineOutline

• Why there is a problem

• What can be done to find/fix problems

• Tools you can use

• Ramblings on what’s next

Page 3: Rich Carlson

Basic PremiseBasic Premise

• Application’s performance should meet your expectations!

• If they don’t you should complain!

Page 4: Rich Carlson

QuestionsQuestions

• How many times have you said:• What’s wrong with the network?• Why is the network so slow?

• Do you have any way to find out?• Tools to check local host• Tools to check local network• Tools to check end-to-end path

Page 5: Rich Carlson

Underlying AssumptionUnderlying Assumption

• When problems exist, it’s the networks fault!

Page 6: Rich Carlson

NDT Demo FirstNDT Demo First

Page 7: Rich Carlson

Simple Network PictureSimple Network Picture

Bob’sHost

Network Infrastructure

Carol’sHost

Page 8: Rich Carlson

Sw

itch

1

Switch 2 Switch 3

Network InfrastructureNetwork Infrastructure

R1

R3

R4

R2R7

R6R9

R8R5

Switch 4

Page 9: Rich Carlson

Possible BottlenecksPossible Bottlenecks

• Network infrastructure

• Host computer

• Application design

Page 10: Rich Carlson

Network Infrastructure BottlenecksNetwork Infrastructure Bottlenecks

• Links too small• Using standard Ethernet instead of FastEthernet

• Links congested• Too many hosts crossing this link

• Scenic routing• End-to-end path is longer than it needs to be

• Broken equipment• Bad NIC, broken wire/cable, cross-talk

• Administrative restrictions• Firewalls, Filters, shapers, restrictors

Page 11: Rich Carlson

Host Computer BottlenecksHost Computer Bottlenecks

• CPU utilization• What else is the processor doing?

• Memory limitations• Main memory and network buffers

• I/O bus speed• Getting data into and out of the NIC

• Disk access speed

Page 12: Rich Carlson

Application Behavior BottlenecksApplication Behavior Bottlenecks

• Chatty protocol• Lots of short messages between peers

• High reliability protocol• Send packet and wait for reply before

continuing

• No run-time tuning options• Use only default settings

• Blaster protocol• Ignore congestion control feedback

Page 13: Rich Carlson

TCP 101TCP 101

• Transmission Control Protocol (TCP)• Provides applications with a reliable in-order

delivery service• The most widely used Internet transport

protocol• Web, File transfers, email, P2P, Remote login

• User Datagram Protocol (UDP)• Provides applications with an unreliable delivery

service• RTP, DNS

Page 14: Rich Carlson

Summary – Part 1Summary – Part 1

• Problems can exist at multiple levels• Network infrastructure• Host computer• Application design

• Multiple problems can exist at the same time

• All problems must be found and fixed before things get better

Page 15: Rich Carlson

Summary – Part 2Summary – Part 2

• Every problem exhibits the same symptom• The application performance doesn’t meet

the users expectations!

Page 16: Rich Carlson

OutlineOutline

• Why there is a problem

• What can be done to find/fix problems

• Tools you can use

• Ramblings on what’s next

Page 17: Rich Carlson

Real Life ExamplesReal Life Examples

• I know what the problem is

• Bulk transfer with multiple problems

Page 18: Rich Carlson

Example 1 - SC’04 experienceExample 1 - SC’04 experience

•Booth having trouble getting application to run from Amsterdam to Pittsburgh

•Tests between remote SGI and local PC showed throughput limited to < 20 Mbps

•Assumption is: PC buffers too small

•Question: How do we set WinXP send/receive window size

Page 19: Rich Carlson

SC’04 Determine WinXP infoSC’04 Determine WinXP info

http://www.dslreports.com/drtcp

Page 20: Rich Carlson

SC’04 Confirm PC settingsSC’04 Confirm PC settings

•DrTCP reported 16 MB buffers, but test program still slow, Q: How to confirm?

•Run test to SC NDT server (PC has Fast Ethernet Connection)•Client-to-Server: 90 Mbps•Server-to-Client: 95 Mbps•PC Send/Recv window size: 16 Mbytes (wscale 8)•NDT Send/Recv window Size: 8 Mbytes (wscale 7)•Reported TCP RTT: 46.2 msec

• approximately 600 Kbytes of data in TCP buffer

•Min window size / RTT: 1.3 Gbps

Page 21: Rich Carlson

SC’04 Local PC Configured OKSC’04 Local PC Configured OK

•No problem found

•Able to run at line rate

•Confirmed that PC’s TCP window values were set correctly

Page 22: Rich Carlson

SC’04 Remote SGISC’04 Remote SGI

•Run test from remote SGI to SC show floor (SGI is Gigabit Ethernet connected).•Client-to-Server: 17 Mbps•Server-to-Client: 16 Mbps•SGI Send/Recv window size: 256 Kbytes (wscale 3)•NDT Send/Recv window Size: 8 Mbytes (wscale 7)•Reported RTT: 106.7 msec•Min window size / RTT: 19 Mbps

Page 23: Rich Carlson

SC’04 Remote SGI ResultsSC’04 Remote SGI Results

•Needed to download and compile command line client

•SGI TCP window is too small to fill transatlantic pipe (19 Mbps max)

•User reluctant to make changes to SGI network interface from SC show floor

•NDT client tool allows application to change buffer (setsockopt() function call)

Page 24: Rich Carlson

SC’04 Remote SGI (tuned)SC’04 Remote SGI (tuned)

•Re-run test from remote SGI to SC show floor.•Client-to-Server: 107 Mbps•Server-to-Client: 109 Mbps•SGI Send/Recv window size: 2 Mbytes (wscale 5)•NDT Send/Recv window Size: 8 Mbytes (wscale 7)•Reported RTT: 104 msec•Min window size / RTT: 153.8 Mbps

Page 25: Rich Carlson

SC’04 Debugging ResultsSC’04 Debugging Results

•Team spent over 1 hour looking at Win XP config, trying to verify window size

•Single NDT test verified this in under 30 seconds

•10 minutes to download and install NDT client on SGI

•15 minutes to discuss options and run client test with set buffer option

Page 26: Rich Carlson

SC’04 Debugging ResultsSC’04 Debugging Results

•8 Minutes to find SGI limits and determine maximum allowable window setting (2 MB)

•Total time 34 minutes to verify problem was with remote SGIs’ TCP send/receive window size

•Network path verified but Application still performed poorly until it was also tuned

Page 27: Rich Carlson

Example 2 – SCP file transferExample 2 – SCP file transfer

• Bob and Carol are collaborating on a project. Bob needs to send a copy of the data (50 MB) to Carol every ½ hour. Bob and Carol are 2,000 miles apart. How long should each transfer take?• 5 minutes?• 1 minute?• 5 seconds?

Page 28: Rich Carlson

What should we expect?What should we expect?

• Assumptions:• 100 Mbps Fast Ethernet is the slowest link• 50 msec round trip time

• Bob & Carol calculate:• 50 MB * 8 = 400 Mbits • 400 Mb / 100 Mb/sec = 4 seconds

Page 29: Rich Carlson

Initial SCP Test ResultsInitial SCP Test Results

Page 30: Rich Carlson

Initial Test ResultsInitial Test Results

• This is unacceptable!

• First look for network infrastructure problem• Use NDT tester to examine both hosts

Page 31: Rich Carlson

Initial NDT testing shows Duplex Mismatch at one end

Initial NDT testing shows Duplex Mismatch at one end

Page 32: Rich Carlson

NDT Found Duplex MismatchNDT Found Duplex Mismatch

• Investigating this it is found that the switch port is configured for 100 Mbps Full-Duplex operation.• Network administrator corrects

configuration and asks for re-test

Page 33: Rich Carlson

Duplex Mismatch CorrectedDuplex Mismatch Corrected

Page 34: Rich Carlson

SCP results after Duplex Mismatch Corrected

SCP results after Duplex Mismatch Corrected

Page 35: Rich Carlson

Intermediate ResultsIntermediate Results

• Time dropped from 18 minutes to 40 seconds.

• But our calculations said it should take 4 seconds!• 400 Mb / 40 sec = 10 Mbps• Why are we limited to 10 Mbps?• Are you satisfied with 1/10th of the possible

performance?

Page 36: Rich Carlson

Default TCP window settingsDefault TCP window settings

Page 37: Rich Carlson

Calculating the Window SizeCalculating the Window Size

• Remember Bob found the round-trip time was 50 msec

• Calculate window size limit• 85.3KB * 8 b/B = 698777 b• 698777 b / .050 s = 13.98 Mbps

• Calculate new window size• (100 Mb/s * .050 s) / 8 b/B = 610.3 KB• Use 1MB as a minimum

Page 38: Rich Carlson

Resetting Window ValueResetting Window Value

Page 39: Rich Carlson

With TCP windows tunedWith TCP windows tuned

Page 40: Rich Carlson

Steps so farSteps so far

• Found and fixed Duplex Mismatch • Network Infrastructure problem

• Found and fixed TCP window values• Host configuration problem

• Are we done yet?

Page 41: Rich Carlson

SCP results with tuned windowsSCP results with tuned windows

Page 42: Rich Carlson

Intermediate ResultsIntermediate Results

• SCP still runs slower than expected• Hint: SCP uses internal buffers• Patch available from PSC

Page 43: Rich Carlson

SCP Results with tuned SCPSCP Results with tuned SCP

Page 44: Rich Carlson

Final ResultsFinal Results

• Fixed infrastructure problem

• Fixed host configuration problem

• Fixed Application configuration problem• Achieved target time of 4 seconds to

transfer 50 MB file over 2000 miles

Page 45: Rich Carlson

Why is it hard to Find/Fix Problems?Why is it hard to Find/Fix Problems?

• Network infrastructure is complex

• Network infrastructure is shared

• Network infrastructure consists of multiple components

Page 46: Rich Carlson

Shared InfrastructureShared Infrastructure

• Other applications accessing the network• Remote disk access• Automatic email checking• Heartbeat facilities

• Other computers are attached to the closet switch• Uplink to campus infrastructure

• Other users on and off site• Uplink from campus to gigapop/backbone

Page 47: Rich Carlson

Other Network ComponentsOther Network Components

• DHCP (Dynamic Host Resolution Protocol) • At least 2 packets exchanged to configure your

host

• DNS (Domain Name Resolution)• At least 2 packets exchanged to translate FQDN

into IP address

• Network Security Devices• Intrusion Detection, VPN, Firewall

Page 48: Rich Carlson

Network InfrastructureNetwork Infrastructure

• Large complex system with potentially many problem areas

Page 49: Rich Carlson

Why is it hard to Find/Fix Problems?Why is it hard to Find/Fix Problems?

• Computers have multiple components

• Each Operating System (OS) has a unique set of tools to tune the network stack

• Application Appliances come with few knobs and limited options

Page 50: Rich Carlson

Computer ComponentsComputer Components

• Main CPU (clock speed)

• Front & Back side bus

• Main Memory

• I/O Bus (ATA, SCSI, SATA)

• Disk (access speed and size)

Page 51: Rich Carlson

Computer IssuesComputer Issues

• Lots of internal components with multi-tasking OS

• Lots of tunable TCP/IP parameters that need to be ‘right’ for each possible connection

Page 52: Rich Carlson

Why is it hard to Find/Fix Problems?Why is it hard to Find/Fix Problems?

• Applications depend on default system settings

• Problems scale with distance

• More access to remote resources

Page 53: Rich Carlson

Default System SettingsDefault System Settings

• For Linux 2.6.13 there are:• 11 tunable IP parameters • 45 tunable TCP parameters • 148 Web100 variables (TCP MIB)• Currently no OS ships with default settings that work well

over trans-continental distances

• Some applications allow run-time setting of some options• 30 settable/viewable IP parameters• 24 settable/viewable TCP parameters• There are no standard ways to set run-time option

‘flags’

Page 54: Rich Carlson

Application IssuesApplication Issues

• Setting tunable parameters to the ‘right’ value

• Getting the protocol ‘right’

Page 55: Rich Carlson

How do you set realistic Expectations?How do you set realistic Expectations?

• Assume network bandwidth exists or find out what the limits are• Local LAN connection• Site Access link

• Monitor the link utilization occasionally• Weathermap• MRTG graphs

• Look at your host config/utilization• What is the CPU utilization

Page 56: Rich Carlson

Ethernet, FastEthernet, Gigabit EthernetEthernet, FastEthernet, Gigabit Ethernet

• 10/100/1000 auto-sensing NICs are common today

• Most campuses have installed 10/100 switched infrastructure

• Access network links are currently the limiting factor in most networks

• Backbone networks are 10 Gigabit/sec

Page 57: Rich Carlson

Site Access and BackboneSite Access and Backbone

• Campus access via Regional ‘GigaPoP’• Confirm with campus admin

• Abilene Backbone• 10 Gbps POS links coast-to-coast

• Other Federal backbone networks

• Other Commercial network

• Other institutions, sites, and networks

Page 58: Rich Carlson

Tools, Tools, ToolsTools, Tools, Tools

• Ping• Traceroute• Iperf• Tcpdump• Tcptrace• BWCTL• NDT• OWAMP

• AMP• Advisor• Thrulay• Web100• MonaLisa• pathchar• NPAD• Pathdiag

• Surveyor• Ethereal• CoralReef• MRTG• Skitter• Cflowd• Cricket• Net100

Page 59: Rich Carlson

Active Measurement ToolsActive Measurement Tools

• Tools that inject packets into the network to measure some value• Available Bandwidth• Delay/Jitter• Loss

• Requires bi-directional traffic or synchronized hosts

Page 60: Rich Carlson

Passive Measurement ToolsPassive Measurement Tools

• Tools that monitor existing traffic on the network and extract some information• Bandwidth used• Jitter• Loss rate

• May generate some privacy and/or security concerns

Page 61: Rich Carlson

Abilene Weather MapAbilene Weather Map

Page 62: Rich Carlson

MRTG GraphsMRTG Graphs

Page 63: Rich Carlson

Windows XP Performance Windows XP Performance

Page 64: Rich Carlson

OutlineOutline

• Why there is a problem

• What can be done to find/fix problems

• Tools you can use

• Ramblings on what’s next

Page 65: Rich Carlson

Focus on 3 toolsFocus on 3 tools

• Existing NDT tool• Allows users to test network path for a

limited number of common problems

• Existing NPAD tool• Allows users to test local network

infrastructure while simulating a long path

• Emerging PerfSonar tool• Allows users to retrieve network path data

from major national and international REN network

Page 66: Rich Carlson

Network Diagnostic Tool (NDT)Network Diagnostic Tool (NDT)

•Measure performance to users desktop

•Identify real problems for real users•Network infrastructure is the problem•Host tuning issues are the problem

•Make tool simple to use and understand

•Make tool useful for users and network administrators

Page 67: Rich Carlson

NDT user interfaceNDT user interface

• Web-based JAVA applet allows testing from any browser

• Command-line client allows testing from remote login shell

Page 68: Rich Carlson

NDT test suiteNDT test suite

• Looks for specific problems that affect a large number of users• Duplex Mismatch• Faulty Cables• Bottleneck link capacity• Achievable throughput• Ethernet duplex setting• Congestion on this network path

Page 69: Rich Carlson

Duplex Mismatch DetectionDuplex Mismatch Detection

•Developing analytical model to describe how network operates (no prior art?)

•Expanding model to describe UDP and TCP flows

•Test models in LAN, MAN, and WAN environments

NIH/NLM grant funding

Page 70: Rich Carlson

Four Cases of Duplex SettingFour Cases of Duplex Setting

FD-FD FD-HD

HD-FD HD-HD

Page 71: Rich Carlson

Bottleneck Link DetectionBottleneck Link Detection

•What is the slowest link in the end-2-end path?•Monitors packet arrival times using libpacp routine•Use TCP dynamics to create packet pairs•Quantize results into link type bins (no fractional or bonded links)

Cisco URP grant work

Page 72: Rich Carlson

Normal congestion detectionNormal congestion detection

•Shared network infrastructures will cause periodic congestion episodes•Detect/report when TCP throughput is limited by cross traffic•Detect/report when TCP throughput is limited by own traffic

Page 73: Rich Carlson

Faulty Hardware/Link DetectionFaulty Hardware/Link Detection

•Detect non-congestive loss due to•Faulty NIC/switch interface•Bad Cat-5 cable•Dirty optical connector

•Preliminary works shows that it is possible to distinguish between congestive and non-congestive loss

Page 74: Rich Carlson

Full/Half Link Duplex settingFull/Half Link Duplex setting

•Detect half-duplex link in E2E path• Identify when throughput is limited by half-duplex operations

•Preliminary work shows detection possible when link transitions between blocking states

Page 75: Rich Carlson

Finding Results of InterestFinding Results of Interest

• Duplex Mismatch • This is a serious error and nothing will work

right. Reported on main page and on Statistics page

• Packet Arrival Order• Inferred value based on TCP operation.

Reported on Statistics page, (with loss statistics) and order: value on More Details page

Page 76: Rich Carlson

Finding Results of InterestFinding Results of Interest

• Packet Loss Rates• Calculated value based on TCP operation.

Reported on Statistics page, (with out-of-order statistics) and loss: value on More Details page

• Path Bottleneck Capacity• Measured value based on TCP operation.

Reported on main page

Page 77: Rich Carlson

Additional Functions and FeaturesAdditional Functions and Features

•Provide basic tuning information

•Basic Features •Basic configuration file •FIFO scheduling of tests•Simple server discovery protocol•Federation mode support•Command line client support

•Created sourceforge.net project page

Page 78: Rich Carlson

NPAD/pathdiagNPAD/pathdiag

• A new tool from researchers at Pittsburgh Supercomputer Center

• Finds problems that affect long network paths

• Uses Web100-enhanced Linux based server

• Web based Java client

Page 79: Rich Carlson

Long Path ProblemLong Path Problem

• E2E application performance is dependant on distance between hosts

• Full size frame time at 100 Mbps• Frame = 1500 Bytes• Time = 0.12 msec• In flight for 1 msec RTT = 8 packets• In flight for 70 msec RTT = 583 packets

Page 80: Rich Carlson

Sw

itch

1

Switch 2Switch 3

Long Path ProblemLong Path Problem

R1

R3

R4

R2R7

R6R9

R8R5

Switch 4

H1

H2

H3X

1 msec H1 – H2

70 msec H1 – H3

Page 81: Rich Carlson

TCP Congestion AvoidanceTCP Congestion Avoidance

• Cut number of packets by ½

• Increase by 1 per RTT• LAN (RTT=1msec)• In flight changes to 4 packets• Time to increase back to 8 is 4msec

• WAN (RTT = 70 msec)• In flight changes to 292 packets• Time to increase back to 583 is 20.4 seconds

Page 82: Rich Carlson

PerfSonar – Next Steps in Performance MonitoringPerfSonar – Next Steps in Performance Monitoring

• New Initiative involving multiple partners• ESnet (DOE labs)• GEANT (European Research and

Education network)• Internet2 (Abilene and connectors)

Page 83: Rich Carlson

PerfSonar – Router stats on a pathPerfSonar – Router stats on a path

• Demo ESnet tool

https://performance.es.net/cgi-bin/perfsonar-trace.cgi

Paste output from Traceroute into the window and view the MRTG graphs for the routers in the path

Author: Joe Metzger ESnet

Page 84: Rich Carlson

Traceroute VisualizerTraceroute Visualizer

Page 85: Rich Carlson

The Wizard Gap*The Wizard Gap*

* Courtesy of Matt Mathis (PSC)

Page 86: Rich Carlson

Google it!Google it!

• Enter “tuning tcp” into the google search engine.

• Top 2 hits are:http://www.psc.edu/networking/perf_tune.html

http://www-didc.lbl.gov/TCP-tuning/TCP-tuning.html

Page 87: Rich Carlson

PSC Tuning PagePSC Tuning Page

Page 88: Rich Carlson

LBNL Tuning PageLBNL Tuning Page

Page 89: Rich Carlson

Internet2 Land Speed Record Internet2 Land Speed Record

• Challenge to community to demonstrate how to run fast – long distance flows

• 2000 record – 751 Mbps over 5,262 km

• 2005 record - 7.2 Gbps over 30,000 km

Page 90: Rich Carlson

ConclusionsConclusions

• Applications can fully utilize the network

• All problems have a single symptom• All problems must be found and fixed before

things get better• Some people stop investigating before finding all

problems

• Tools exist, and more are being developed, to make it easier to find problems

Page 91: Rich Carlson

• Extra Material

Page 92: Rich Carlson

OutlineOutline

• Why there is a problem

• What can be done to find/fix problems

• Tools you can use

• Ramblings on what’s next

Page 93: Rich Carlson

IntroductionIntroduction

• Where have we been and where are we headed?• Technology and hardware• Transport Protocols

Page 94: Rich Carlson

Basic AssumptionBasic Assumption

• The Internet was designed to improve communications between people

Page 95: Rich Carlson

What does the future hold?What does the future hold?

• Moore’s Law shows no signs of slowing down• The original law says the number of transistors

on a chip doubles every 18 months• Now it simply means that everything gets faster

Page 96: Rich Carlson

PC HardwarePC Hardware

• CPU processing power (flops) is increasing

• Front/back side bus clock rate is increasing

• Memory size is increasing

• HD size is increasing too• For the past 10 years, every HD I’ve purchased

cost $130

Page 97: Rich Carlson

Scientific WorkstationScientific Workstation

• PC or Sparc class computer• Fast CPU• 1 GB RAM• 1 TB disk• 10 Gbps NIC

• Today’s cost ~ $5,000

Page 98: Rich Carlson

Network CapabilityNetwork Capability

• LAN networks (includes campus)

• MAN/RON network

• WAN network

• Remember the 80/20 rule

Page 99: Rich Carlson

Network NIC costsNetwork NIC costs

• 10 Mbps NICs were $50 - $150 circa 1985

• 100 Mbps NICS were $50 - $150 circa 1995

• 1,000 Mbps NICS are $50 - $150 circa 2005

• 10 Gbps NICs are $1,500 - $2,500 today

• Note today 10/100/1000 cards are common and 10/100 cards are < $10

Page 100: Rich Carlson

Ethernet SwitchesEthernet Switches

• Unmanaged 5 port 10/100 switch ~ $25.00

• Unmanaged 5 port 10/100/1000 switch ~ $50

• Managed switches have more ports and are more expensive ($150 - $400 per port)

Page 101: Rich Carlson

Network InfrastructureNetwork Infrastructure

• Campus

• Regional

• National

• International

Page 102: Rich Carlson

Campus InfrastructureCampus Infrastructure

• Consists of switches, routers, and cables

• Limited funds make it hard to upgrade

Page 103: Rich Carlson

Regional InfrastructureRegional Infrastructure

• Many states have optical networks• Illinois has I-Wire

• Metro area optical gear is ‘reasonably’ priced

• Move by some to own fiber

• Flexible way to cut operating costs, but requires larger up-front investment

Page 104: Rich Carlson

National InfrastructureNational Infrastructure

• Commercial vendors have pulled fiber to major metro areas

• NLR – n x 10 Gbps• Abilene - 1 x 10 Gbps (Qwest core)• FedNets - (DoE, DoD, and NASA all run

national networks)• CA*net – n x 10 Gbps• Almost 500 Gbps into SC|05 conference

in Seattle

Page 105: Rich Carlson

International InfrastructureInternational Infrastructure

• Multiple trans-atlantic 10 Gbps links

• Multiple trans-pacific 10 Gbps links

• Gloriad

Page 106: Rich Carlson

Interesting sidebarInteresting sidebar

• China’s demand for copper, aluminum, and steel have caused an increase in theft• Man hole covers• Street lamps• Parking meters• Phone cable

• One possible solution is to replace copper wires with FTTH solutions

Page 107: Rich Carlson

Transport ProtocolTransport Protocol

• TCP Reno has know problems with loss at high speeds• Linear growth following packet loss• No memory of past achievements

• TCP research groups are actively working on solutions:• HighSpeed-TCP, Scaleable-TCP, Hamilton-

TCP, BIC, CUBIC, FAST, UDT, Westwood+• Linux (2.6.13) has run-time support for these

stacks

Page 108: Rich Carlson

What drives prices?What drives prices?

• Electronic component prices are driven by units produces• Try buying a brand NEW i386 CPU• Try upgrading your PC’s CPU• NIC’s are no different