Buff Goldberg Speeding Up TCP

12
Web Servers Should T urn Off Nagle to Avoid Unnecessary 200 ms Delays + Robert Buff, Arthur Goldberg Computer Science Department Courant Institute of Mathematical Science New York University {buff, a rtg}@cs.nyu.ed u www.cs.nyu.edu/cs/faculty/artg 1 Abstract We show that the silly window syndrome (SWS) avoidance algorithms in standard implementations of TCP significantly slow the Web protocols HTTPS and HTTP in certain circumstances. Substantial delays of several 100 ms may occur on fast Intranet transactions that might otherwi se complete in a few tens of milliseconds. We illustrate this performance bug with TCP p acket traces from test programs and production Web systems. This bug is easily and reliably avoided by disabling Nagle’s algorithm at the sender on every connection. 2 Introduction Current TCP implementations deliver high bandwidth w hen transmitting large segments. Less attention has focused on the response time of TCP transacti ons that exchange smaller segments. However, this response time is important because widespread applications, like the Web, employ such transactions. 2.1 TCP Review  TCP supports a reliable, full duplex, network transport byte stre am. TCP usually packetizes applic ation data into segments that fit into a single IP packet. The largest TCP segment that can be sent on a connection can hold maximum segment size (MSS) bytes, and is called a MSS segment. HTTPS and HTTP are client/server protocols that u se TCP. Typically, a client/server interaction (or transaction) consists of a small request message sent by the client to the server and a response message sent back. Depending on the implementation, an application message is sent by just one or multiple TCP socket writes. TCP may map application data arbitrarily to segment boundaries. A TCP receiver acknowledges receipt of data by sending the sender the sequence number of the next expected byte. In addition, a receiver manages buffer s pace by advertising an available ‘window’ b eyond data it has received. 2.1.1 The Si lly Window Syndrome As described in RFC 1122 [Braden 89] and Section 13.29 of [Comer 95], early TCP implementations exhibited a problem known as the sill y window syndrome (SWS). In SWS a connection reaches a steady state in which each acknowledgement advertises a small window and each data segment carries a small amount of data. SWS occurs, for example, w hen the receiver repeatedly reads just one byte from a connection with no advertised window. The TCP standard [Braden 89] requires both senders and receivers to incorporate algorithms that avoid SWS. In brief, a receiver avoids advertising small TCP windows and delays transmitting acknowledgements. A sender implements the Nagle al gorithm, which delays transmi ssion of partially filled segments until all pr eviously transmitted data has been acknowl edged. For more detail on SWS avoidance, we review the TCP specification. + Submitted to “Protocols for High Speed Networking ‘99”

Transcript of Buff Goldberg Speeding Up TCP

8/8/2019 Buff Goldberg Speeding Up TCP

http://slidepdf.com/reader/full/buff-goldberg-speeding-up-tcp 1/12

Web Servers Should Turn Off Nagle to Avoid Unnecessary200 ms Delays+

Robert Buff, Arthur GoldbergComputer Science Department

Courant Institute of Mathematical Science

New York University{buff, artg}@cs.nyu.edu

www.cs.nyu.edu/cs/faculty/artg

1 Abstract

We show that the silly window syndrome (SWS) avoidance algorithms in standard implementations of TCP significantly slow the Web protocols HTTPS and HTTP in certain circumstances. Substantial delaysof several 100 ms may occur on fast Intranet transactions that might otherwise complete in a few tens of milliseconds.

We illustrate this performance bug with TCP packet traces from test programs and production Websystems. This bug is easily and reliably avoided by disabling Nagle’s algorithm at the sender on every

connection.

2 Introduction

Current TCP implementations deliver high bandwidth when transmitting large segments. Less attentionhas focused on the response time of TCP transactions that exchange smaller segments. However, thisresponse time is important because widespread applications, like the Web, employ such transactions.

2.1 TCP Review 

TCP supports a reliable, full duplex, network transport byte stream. TCP usually packetizes applicationdata into segments that fit into a single IP packet. The largest TCP segment that can be sent on aconnection can hold maximum segment size (MSS) bytes, and is called a MSS segment.

HTTPS and HTTP are client/server protocols that use TCP. Typically, a client/server interaction (or transaction) consists of a small request message sent by the client to the server and a response messagesent back. Depending on the implementation, an application message is sent by just one or multiple TCPsocket writes. TCP may map application data arbitrarily to segment boundaries.

A TCP receiver acknowledges receipt of data by sending the sender the sequence number of the nextexpected byte. In addition, a receiver manages buffer space by advertising an available ‘window’ beyonddata it has received.

2.1.1 The Silly Window Syndrome

As described in RFC 1122 [Braden 89] and Section 13.29 of [Comer 95], early TCP implementationsexhibited a problem known as the silly window syndrome (SWS). In SWS a connection reaches a steadystate in which each acknowledgement advertises a small window and each data segment carries a smallamount of data. SWS occurs, for example, when the receiver repeatedly reads just one byte from aconnection with no advertised window.

The TCP standard [Braden 89] requires both senders and receivers to incorporate algorithms that avoidSWS. In brief, a receiver avoids advertising small TCP windows and delays transmittingacknowledgements. A sender implements the Nagle algorithm, which delays transmission of partiallyfilled segments until all previously transmitted data has been acknowledged. For more detail on SWSavoidance, we review the TCP specification.

+ Submitted to “Protocols for High Speed Networking ‘99”

8/8/2019 Buff Goldberg Speeding Up TCP

http://slidepdf.com/reader/full/buff-goldberg-speeding-up-tcp 2/12

2.1.2 The TCP Specification

The TCP specification describes how the receiver and sender avoid SWS. Section 4.2.3.2 states that atthe receiver 

 A TCP SHOULD implement a delayed ACK, but an ACK should not be

excessively delayed; in particular, the delay MUST be less than 0.5

seconds, and in a stream of full-sized segments there SHOULD be an ACK 

for at least every second segment.

Therefore a receiver can always delay acknowledging a partial segment.

Section 4.2.3.3 says that

 A TCP MUST include a SWS avoidance algorithm in the receiver. […] The

receiver's SWS avoidance algorithm determines when the right window

edge may be advanced; […]

For realistic receive buffers (greater than twice the MSS) window advances are announced in incrementsof MSS.

Section 4.2.3.4, “When to Send Data” says that

 A TCP MUST include a SWS avoidance algorithm in the sender. […] A TCP

SHOULD implement the Nagle Algorithm [Nagle 84] to coalesce short

segments. However, there MUST be a way for an application to disable

the Nagle algorithm on an individual connection. […]

The Nagle algorithm is generally as follows: If there is

unacknowledged data […] then the sending TCP buffers all user data […]

until the outstanding data has been acknowledged or until the TCP can

send a full-sized segment […]

If the receiver delays acknowledgements, and the application writes less than MSS to the socket, andNagle is enabled, then sending TCP delays transmission.

The specification also says

To avoid a resulting deadlock, it is necessary to have a timeout to

force transmission of data […].

but in all traces we collected, the delayed acknowledgement appears to timeout before the Naglealgorithm.

2.2 HTTPS and HTTP Performance Problems

In several situations HTTPS and HTTP trigger SWS avoidance in both the sender and receiver, therebycreating substantial delays. The application layer situations were the following:

• HTTPS / SSL key exchange, new and reused session key: The server writes two small messagesand blocks waiting for response; the browser reads both messages and responds.

• HTTPS / SSL key exchange, reused session key: Same situation, but with directions reversed. Thebrowser writes two small messages and blocks waiting for response; the server reads both messagesand responds.

• HTTP image (GIF), smaller than MSS: The server sends the HTTP response in two small separatewrites, containing headers and body (image data), respectively.

All three cases lead to the same TCP situation. The sender transmits the first message in a separatesegment, then waits for its acknowledgement. It transmits the second message in a separate segmentwhen the acknowledgment arrives. The receiver receives the first segment, but delays theacknowledgment because the segment is partial and the window available to advertise is less than MSS.Eventually, a time-out triggers the acknowledgment, thus causing the sender to send the secondsegment.

2

8/8/2019 Buff Goldberg Speeding Up TCP

http://slidepdf.com/reader/full/buff-goldberg-speeding-up-tcp 3/12

The avoidable delay is determined by the delayed ack time-out. As documented by [Microsoft 97] and our measurements Win32 TCP delays acks by typically 200 ms. Our measurements indicate that Sun’sSolaris delays acks by about 45 ms, on average.

3 Related Work

Network designers understood that delayed acknowledgements may slow application communications, asdescribed in [Stevens 98], [Microsoft 98] and [Sun 98].

[Heidemann 97] discusses this problem in persistent HTTP. He states that the problem does not occur inHTTP, versions 1.0 and earlier, but we find it does.

Microsoft acknowledges the problem [Microsoft 97] and [Microsoft 97], but indicates that it only occurswhen making small sends.

[Nielsen 98] and [Nielsen 97] measure the cumulative performance of a set of accesses to arepresentative Web site. In [Nielsen 98] enabling Nagle in an HTTP/1.1 server slows performance:

SituationTime(sec)

Time(sec)

Nagle 0.48 0.27

NoNagle 0.45 0.21

However, the client ran on a Digital Alpha station 400 4/233, UNIX 4.0a, rather than Win32 whichconcerns us.

4 The lab experiments

4.1 A simple client-server test application

Before discussing production cases, we analyze the performance behavior of a simple client-server laboratory application that triggers the bug. The test application runs 100 identical transactions. Eachtransaction consists of the exchange of a 10-byte client request and a 20-byte server response. The 20-byte response is written to the socket by two 10-byte write() calls. No computational overhead is

involved. All communication is strictly sequential : the client only initiates a subsequent transaction after the entire 20-byte server response has been received.

We ran the same test application on Win32, Solaris and Linux, in different client/server combinations. Thesource code remained unchanged for all systems. The application uses the standard Berkeley socketinterface. There is no delay between the writes. The resulting fragmentation in the application layer at theserver is maintained in lower layers.

In half of our tests, the Nagle algorithm was activated. In the other half, Nagle was deactivated. To turnNagle on or off, the setsockopt() system call was used with the TCP_NODELAY option.

4.2 Experimental setup

Table 1 lists the clients and servers used in our experiments. All computers except Win98 are connectedto the same 100Mbit Ethernet. Win98 is connected to a 100Mbit Ethernet separated from the others byone router. Network congestion was insignificant during our experiments. IP segment traces werecollected with Network General's NetXRay network monitoring tool. The traces appear complete andaccurate.

3

8/8/2019 Buff Goldberg Speeding Up TCP

http://slidepdf.com/reader/full/buff-goldberg-speeding-up-tcp 4/12

Name Processor Operating System

NT4Wa Pentium, 100 MHz NT Workstation 4.00.1381

NT4Wb Pentium, 200 MHz NT Workstation 4.00.1381

NT4S Pentium, 233 MHz NT Server 4.00.1381, SP 3

NT5S Pentium, 233 MHz NT Server 5.00.1671, beta 1

Win95 Pentium, 100 MHz Windows 95

Win98 Pentium, 90 MHz Windows 98

Linux i486, 66 MHz Linux 2.0.31

Solaris Sun SPARCstation 5 SunOS 5.6

Table 1. Test machines and operating systems.

The test application was run on these six pairs of machines from Table 1:

NT4Wa/NT4S, NT4Wb/Win95, NT4Wb/Win98, NT4Wa/NT5S, NT4Wa/Solaris, NT4Wa/Linux

Although the focus was on covering all Win32 implementations (NT 4 Workstation and Server, NT 5 beta,Windows 95 and 98), we also tested a Win32/Linux and a Win32/Solaris configuration. In each of the sixcombinations, each partner acted as both client and server in two successive executions. Each executionwas run twice, with the Nagle algorithm enabled and disabled. In total, the test application was run

6× 2× 2=24 times.

In the following sections, we present two traces with lengthy ack delays of about 45 ms for Solaris and190 ms for Win32, respectively. Then, we show a recorded trace of an execution without lengthy ackdelay between the first and second server response. Finally, we give a performance summary of all 24executions.

4.3 TCP segment traces: lengthy ack delays

The following two traces exhibit lengthy ack delays.

The trace in Table 2 was recorded between the NT client NT4Wa and the NT server NT4S, with Nagleactive on the server. In all 100 transactions, the first server 10-byte response segment is acknowledged

separately by the client after a delay of, on average, 187.3 ms. Packets 6 and 10 are acks sent by theclient, which were delayed because the client TCP has no data to send and has received a partialsegment.

4

8/8/2019 Buff Goldberg Speeding Up TCP

http://slidepdf.com/reader/full/buff-goldberg-speeding-up-tcp 5/12

Segment,payload [bytes]

Delta[ms]

1 TCP handshake

2 0.3 0.3 ms cumulative

3 0.3 0.6

4 Request, 16 9.1 9.7

5 First response, 10 3.6 13.3

6 (ack) 114.4 127.7

7 Second response, 10 0.2 127.9

8 Request, 16 8.3 136.2

9 First response, 10 3.5 139.7

10 (ack) 188.3 328.0

11 Second response, 10 0.2 328.2

and so on

Table 2. NT client NT4Wa and NT server NT4S, with Nagle active on the server. In all transactions,the client acknowledges the first server response segment separately and after lengthy delay. Inthis configuration, the delay is 187.3 ms on average, after a slightly lower initial delay of 114.4 msin the first transaction.

The trace in Table 3 was recorded between the Solaris client and the NT server NT4Wa, with Nagleactive on the server. For all 100 transactions, the client acknowledges the first server 10-byte responsesegment after a delay of, on average, 46.2 ms. Again, Nagle prevents the server from sending the secondhalf of its response earlier. Although in this case the delay is much smaller than for Win32 clients, it stilldominates the overall average transaction duration of 4.3 ms on average by an order of magnitude.

Segment,payload [bytes]

Delta[ms]

1 TCP handshake

2 0.4 0.4 ms cumulative

3 0.7 1.1

4 Request, 16 3.1 4.2

5 First response, 10 0.8 5.0

6 (ack) 0.6 5.6

7 Second response, 10 0.3 5.9

8 Request, 16 2.1 8.0

9 First response, 10 0.8 8.8

10 (ack) 41.8 50.6

11 Second response, 10 0.3 50.9

and so on

Table 3. Solaris client and NT server NT4Wa, with Nagle active on the server. The server’s firstresponse segment is always acknowledged separately. The ack is delayed in the second andsubsequent transactions. In this configuration, the delay is 46.2 ms on average, after virtually no

delay in the first transaction. Throughout this paper, the arrow ( or ) indicates a segment's

5

8/8/2019 Buff Goldberg Speeding Up TCP

http://slidepdf.com/reader/full/buff-goldberg-speeding-up-tcp 6/12

direction between client on the left and server on the right. So indicates a segment from client

to server, as in "client server", and vice-versa.

We only show one sample trace of all Win32/Win32 combinations (Table 2), because all Win32/Win32traces perform similarly. The Solaris/NT trace shows that the amount of delay chosen by the actual TCPimplementation can vary widely (here, by a factor of 4).

These delays are consistent with the analysis of delay durations in Section 9 of [Paxson 97].

4.4 TCP segment traces: performance with short ack delay, and Nagle off 

The trace shown in Table 4 was recorded between the Linux client and the NT server NT4Wa, withNagle deactivated on the server. In this trace, the Linux client does not send a separate ack for the first10-byte server response segment (segments 5 and 9 in Table 4). Since Nagle is deactivated, the NTserver immediately pushes the second 10-byte server response segment to the client (segments 6 and 10in Table 4), resulting in a very low overall transaction duration of about 4.4 ms on average.

Segment,payload [bytes]

Delta[ms]

1 TCP handshake

2 0.5 0.5 ms cumulative

3 0.9 1.4

4 Request, 16 9.7 11.1

5 First response, 10 1.5 12.6

6 Second response, 10 1.2 13.8

7 (ack) 13.6 27.4

8 Request, 16 32.5 59.9

9 First response, 10 1.3 61.2

10 Second response, 10 1.2 62.4

and so on

Table 4. The first two request/response transactions between the Linux client and the NT server NT4Wa, with Nagle deactivated. Shown is the differential (“Delta”) and cumulative wire time, asmeasured by NetXRay.

Note, however, that in some cases the Linux client does acknowledge the second server responsesegment separately, before initiating the subsequent transaction. In the example, segment 7 is sent 32.5ms before the second transaction is started (transactions are delineated by double horizontal lines). Thisgap is due to influences which we did not investigate. It has no impact on the delayed ack effect andhappened only 12 times out of 100 in the recorded trace; in the other 88 transactions, the ack waspiggybacked onto the next client request segment.

The Linux/NT trace stands out because no artificial ack delay distorts the performance profile of the test

application. It must be noted, of course, that the Nagle algorithm was turned off in this particular execution. As the performance summary in the next section shows, however, the Linux client remains thefastest performer even if Nagle is active on the server.

6

8/8/2019 Buff Goldberg Speeding Up TCP

http://slidepdf.com/reader/full/buff-goldberg-speeding-up-tcp 7/12

4.5 Test application: performance summary 

Table 5 lists the average delays caused by the delayed ack algorithm for all configurations, in bothdirections, and with Nagle turned on and turned off, respectively. It also lists the overall transaction times,showing that in all cases the ack delay is the dominant factor.

The Windows 95 and Windows 98 TCP implementations apparently ignore the system call thatdeactivates the Nagle algorithm1. If Nagle is deactivated on any Windows NT server, the time spent ineach transaction is reduced by about 190 ms. The same recipe reduces the overall transaction time byabout 45 ms for the Solaris client, and by about 15 ms for the Linux client.

Delay of ack [ms] Transaction [ms]

Client Server Nagle on Nagle off Nagle on Nagle off  

NT4Wa NT4S 187.3 0.4 191.4 7.9 Table 2

NT4S NT4Wa 187.0 0.2 191.6 3.0

NT4Wb Win95 193.1 194.1 193.1 199.0

Win95 NT4Wb 208.8 0.7 209.5 2.1

NT4Wb Win98 192.5 193.8 198.1 199.8

Win98 NT4Wb 192.5 1.0 192.9 1.5

NT4Wa NT5S 195.9 0.3 196.7 0.7

NT5S NT4Wa 197.4 0.2 198.6 1.6

NT4Wa Solaris 194.2 0.3 196.7 3.0

Solaris NT4Wa 46.2 0.6 47.5 4.3 Table 3

NT4Wa Linux 188.7 0.3 191.8 3.9

Linux NT4Wa 15.8 –––– 21.5 4.4 Table 4

Table 5. A summary of all 24 executions of the test application, listed by client/server configuration and state of the Nagle algorithm on the server. Shown is the average wire timebetween the first 10-byte server response segment and its acknowledgement. Also shown is the

average overall duration of all 100 transactions, measured between the appearances of therequest segment and the final response or ack segment on the network. Only in the lastexperiment is the first server response segment acknowledged separately, regardless of the Nagleactivation state.

5 Production performance problems

The lab experiments illustrate the pattern that leads to lengthy ack delays. In this section we show thatthis pattern occurs in practice and creates significant distortions in response times experienced duringstandard activities on Intranets. We argue that delayed acknowledgments can incur serious performancepenalties in HTTPS and HTTP.

5.1 HTTPS 

HTTPS is a two-layered protocol: HTTP transactions occur on top of the Secure Socket Layer (SSL).Conducting SSL communications involves the following steps:

• A client establishes a TCP connection with a server.

• On top of TCP, the client and server establish a secure SSL communication channel [Freier96]. Theclient and server negotiate a mutually agreeable cipher, a stream encryption algorithm and anauthentication method pair. The client and server use a public key cryptographic protocol to exchange

1 This violates section 4.2.3.4 of the TCP specification in RFC1122, which states that the TCPimplementation must offer the option to turn off the Nagle algorithm.

7

8/8/2019 Buff Goldberg Speeding Up TCP

http://slidepdf.com/reader/full/buff-goldberg-speeding-up-tcp 8/12

secret session keys that will be used to encrypt and decrypt application layer messages. For moredetails, see Bolyard’s nice trace of SSL session setup [Bolyard97].

• On top of SSL, the client and server exchange one or more encrypted HTTP messages.

The secret session keys negotiated in the second step can be reused in subsequent connections.Reusing session keys reduces the time it takes to set up an SSL connection [Goldberg 98].

The following two traces between the NT client NTW4a and the NT server NT4S (Microsoft InternetInformation Server and 128-bit encryption) were recorded back-to-back. In both cases, the client

requested a small document (1 KB) from the server. In the first trace, shown in Table 6, a new secretsession key was negotiated. In the second trace, shown in Table 7, the secret session key was reused,leading to a slightly different sequence of messages. The performance profile of these traces is reliablyrepeatable.

Segment,payload [bytes]

Delta[ms]

1 TCP handshake

2 0.2Initial 0.6 ms for TCP handshake and 203.3 ms for SSLcalculations at the server are omitted.3 0.4

4 SSL, 93 203.3

5 SSL, 726 1.4 205.3 ms cumulative

6 SSL, 204 88.5 293.8

7 SSL, 6 116.2 410.0

8 (ack) 156.1 566.1

9 SSL, 61 0.3 566.4

10 HTTPS Request, 46 10.7 577.1

11 HTTPS Resp 1, 306 3.3 580.4

12 HTTPS Resp 2, 1021 1.3 581.7

13 (ack) 1.5 583.2

14 (ack) 6.4 589.6

15 (ack) 0.2 589.8

Table 6. An HTTPS transaction between NT4Wa and NT4S running Microsoft IIS with 128-bitencryption. Shown are the TCP handshake, the SSL handshake (1090 bytes of data), the HTTPrequest/response messages and the TCP shutdown (separated by double lines, respectively).Segment 8 is delayed for 156.1 ms.

8

8/8/2019 Buff Goldberg Speeding Up TCP

http://slidepdf.com/reader/full/buff-goldberg-speeding-up-tcp 9/12

Segment,payload [bytes]

Delta[ms]

1 TCP handshake

2 0.2 0.2 ms cumulative

3 0.3 0.5

4 SSL, 108 11.1 11.6

5 SSL, 79 1.4 13.0

6 (ack) 133.1 146.1

7 SSL, 67 0.3 146.4

8 SSL, 67 4.5 150.9

9 (ack) 182.8 333.7

10 Request, 46 0.4 334.1

11 HTTPS Resp 1, 306 3.1 337.2

12 HTTPS Resp 2, 1021 1.3 338.5

13 (ack) 1.5 340.0

14 (ack) 6.0 346.0

15 (ack) 0.1 346.1

Table 7. An HTTPS transaction between NT4Wa and NT4S, reusing the previously negotiatedsession key. The SSL handshake requires only the exchanges of 321 bytes of data; however,acknowledgements are delayed twice, once in each direction, in segments 6 and 9.

Table 6 and Table 7 show that the delay of acknowledgements significantly slows the HTTPS transaction.It also shows that acknowledgements can be delayed in both directions, from the client to the server, andfrom the server to the client. Protocols like SSL that involve message exchanges which are more complex

than simple request/response schemes can fall into the delayed ack trap easily.How significant is the slowdown? Table 8 compares the measured duration of the SSL handshake andthe entire transaction with hypothetical durations that obtain when the ack delay identified in segment 8(Table 6) and segments 6 and 9 (Table 7) is subtracted. The SSL handshake is slowed by a factor of 18when session keys are reused! Moreover, this huge slowdown persists when the entire HTTPStransaction is considered, for the HTTP message exchange and the TCP handshake and shutdown takelittle time.

Session key SSL handshake Entire HTTPS transaction

Delay No delay Slowdown Delay No delay Slowdown

New 565 ms 409 ms 38 % 589 ms 433 ms 36 %

Reused 333 ms 17 ms 1826 % 346 ms 30 ms 1046 %

Table 8. A comparison of the measured duration of the SSL handshake and the entire HTTPStransaction, with the overall time attributed to ack delay accounted for (column ‘’Delay”) andsubtracted (column “No delay”), respectively. Shown is also the slowdown factor (Delay – Nodelay)/No delay in percentage.

5.2 HTTP  

Usually, HTTP transactions do not exhibit the delayed ack performance problem. In some cases,however, we find that servers fail to coalesce headers and data of an HTTP response message into asingle segment.

9

8/8/2019 Buff Goldberg Speeding Up TCP

http://slidepdf.com/reader/full/buff-goldberg-speeding-up-tcp 10/12

Table 9 shows such a case for an HTTP/1.1 transaction between an NT client and a Netscape Enterprise3.51 server connected by a Token ring network. Shown are the first two transactions of the persistentHTTP/1.1 session: the client requests an HTML page (segment 4), and subsequently an embedded GIFimage (segment 6). The server returns the headers and data of the HTML page in a single segment, butsends the headers of the HTTP response for the image separately, in partial segment 7. This makes theNT client delay the acknowledgment by 199.4 ms, causing the server to stall before it finally sends theimage body (the server runs Nagle).

About half of the duration of the first two transactions is due to the delayed acknowledgment.

Note that the protocol version (1.1) is not significant in this situation.

Segment,payload [bytes]

Delta[ms]

1 TCP handshake

2 3.0 3.0 ms cumulative

3 0.4 3.4

4 HTTP GET, 390 1.7 5.1

5 HTTP RESP, 1944 10.5 15.6

6 HTTP GET, 414 164.3 179.9

7 HTTP RESP, 354 8.8 188.7

8 (ack) 199.4 388.1

9 HTTP RESP, 3255 12.9 401.0

and so on

Table 9. Delays in a production HTTP/1.1 transaction between a browser running on NT and aNetscape Enterprise/3.5.1 server. The second HTTP RESPONSE transaction is interrupted by adelayed ack in segment 8. The partial segment 7 triggers the delayed ack. Segment 7 containsthe HTTP headers while segment 8 contains more data, in this case a GIF image.

6 SolutionsWe’ve observed several situations in which a sending application (the sender can be either a server or aclient) issues several socket writes, in sequence, of fewer than MSS bytes. If the sender leaves Nagle onthen the SWS avoidance algorithms, including Nagle and the delayed ack cause the timing in Table 2.

We now consider how to solve this problem. We examine it from the viewpoints of two professions:

• Application programmers

• Server administrators

6.1 Application Programmers

Application programmers must use the current sockets interface, operating systems and TCP designsand implementation. They face the choices listed in Table 10.

Writestrategy

Description Appdelay(ms)

Advantages Disadvantages # of IPpackets

Writev transmissiondelayed as incoalesced writes;writev gathers

data from a set of buffers

10 same as coalesced writes,

plus one less data copy

same as coalescedwrites

1

10

8/8/2019 Buff Goldberg Speeding Up TCP

http://slidepdf.com/reader/full/buff-goldberg-speeding-up-tcp 11/12

Coalescedwrites

write delayed until

full transmission (or more than MSSdata) accumulated;can use ANSI Cstdio and fflush

10 ideal network performance;only 1 IP packet

may be difficult toimplement—multiple modulesmay write to the

same socket

1

Twowrites,

Nagle off 

set TCP_NODELAY

on each socket

10 to20

trivial to implement; near idealperformance with respect toapplication layer delay

an extra IP packet 2

Twowrites

separate writes,

less than MSSbytes each

230 none slow, 3 IP packets 3

Table 10. Comparison of techniques for writing two data buffers to a TCP socket. The writestrategies are ordered from best to worst. We assume that the application has two small (lessthan MSS) data buffers to send. In the ‘App delay’ column, we assume one-way latency of approximately 10 ms and a delayed acknowledgment of about 200 ms, as implemented in Win32.‘Coalesced writes’ simply buffer data and issue just one write call. Writev writes a list,

gathering multiple buffers. It avoids a data copy in application space.

Stevens [Stevens 98] (section 7.9, page 204) examines these alternatives and states:

There are three ways to fix this …1. Use writev instead of two calls to write. … This is the preferred solution.

2. Copy the … data … into a single buffer and call write once …

3. Set the TCP_NODELAY socket option and continue to call write two times. This is the least 

desirable solution (italics added).

Considering just performance, we agree with Stevens’ prioritization. However, considering programmer convenience, turning off Nagle may be preferred. When multiple software modules write into one socketthey typically interface by passing the socket handle. It is difficult to coalesce writes or use writev for 

write operations in different modules with this interface.

For example, the SSLeay library interface [Hudson] takes a socket handle and returns a handle for asecure socket. This prevents coalescing SSL handshake writes with other writes. Thus, coalescing

messages 8 and 10 in Table 7 into one write would involve rewriting the SSLeay interface.

Unless an application is architected to coalesce writes or use writev (perhaps by programming to a

sockets wrapper, such as ANSI C stdio, which supports a flush operation) the programmer may encounter great difficulty implementing coalesced writes or writev. Furthermore, we believe that attention to

low-level TCP details, such as comparing a write’s data size with the MSS should not be the

responsibility of an application programmer.

In conclusion, we recommend that applications turn off Nagle (set the TCP_NODELAY option) on any

socket which will not trigger the silly window syndrome. In addition, if possible, applications shouldcoalesce small writes (those less than MSS, typically 1460 bytes on Ethernets) or use writev. We

recommend that server developers distribute patches to turn off Nagle in widely used servers.

6.2 Server AdministratorsTo improve the performance of clients sending small packets with Nagle on, a server administrator shouldrun a server on a TCP implementation with small acknowledgment delays. Three major operatingsystems, in increasing order of delay, are Linux (delay adjusted to interarrival time, observed at 15 to 20ms), Solaris (50 ms) and Win32 (NT, 200 ms). For example, the server TCP implementation determinesthe delay of the ack sent in packet 9 in the SSL key exchange in Table 7.

Server administrators should install patches which turn off Nagle.

11

8/8/2019 Buff Goldberg Speeding Up TCP

http://slidepdf.com/reader/full/buff-goldberg-speeding-up-tcp 12/12

7 Conclusions

We show that excessively delayed TCP acknowledgements can significantly slow Web transactions onIntranets.

Web server and browser implementers are urged to solve this problem by turning off the Nagle algorithmon every TCP socket. Web servers and browsers can safely turn off Nagle—they will not suffer from thesilly window syndrome because they read full buffers from TCP sockets.

A TCP implementation can reduce delay experienced by a sender that enables Nagle by reducing ack

delays. Server administrators are urged to choose operating systems with TCP implementations thatemploy shorter ack delays.

8 References

[Bolyard 97] Bolyard, N., “Export Client SSL Connection Details”, 1997,http://home.netscape.com/eng/ssl3/traces/trc-clnt-ex.html

[Braden 89] R. Braden, Editor, RFC 1122, Requirements for Internet Hosts—Communication Layers,October 1989

[Comer 95] Comer, D. Internetworking with TCP/IP Volume 1: Principles, Protocols, and Architecture.Third edition, Englewood Cliffs, NJ: Prentice Hall, 1995

[Comer 96] Comer, D. and D. Stevens. Internetworking with TCP/IP Volume III: Client-Server Programming and Applications, BSD Socket Version, second edition. Englewood Cliffs, NJ: Prentice Hall,1996.

[Comer 99] Comer, D. and D. Stevens. Internetworking with TCP/IP Volume II: Design, Implementation,and Internals. Third edition, Englewood Cliffs, NJ: Prentice Hall, 1999

[Gaudet] Apache Performance Notes, http://www.apache.org/docs/misc/perf-tuning.html

[Freier 96] Freier, Alan O., Philip Karlton, Paul C. Kocher, “The SSL Protocol Version 3.0” Internet Draft,November 18, 1996. http://home.netscape.com/eng/ssl3/draft302.txt

[Hall 93] Hall, M., et al. Windows Sockets: An Open Interface for Network Programming Under Microsoft Windows, Version 1.1, Revision A, 1993

[Heidemann 97], Heidemann, J., Performance Interactions Between P-HTTP and TCP Implementations,

ACM Computer Communication Review, April 1997

[Hudson] Hudson, Tim J., and Eric A. Young. “SSLeay Programmer Reference”, circa 1997,http://psych.psy.uq.oz.au/~ftp/Crypto/ssl.html

[Microsoft 98], Microsoft, PRB: Poor TCP/IP Performance When Doing Small Sends,http://support.microsoft.com/support/kb/articles/q126/7/16.asp, 7/29/1998

[Microsoft 97], Microsoft, Remote Directory Lists Are Slower Than Local Directory Listings,http://support.microsoft.com/support/kb/articles/q177/2/66.asp, 12/16/1997

[Nagle 84] Nagle, J., “Congestion Control in IP/TCP Internetworks”, RFC 896, Network InformationCenter, SRI International, Menlo Park, CA, 1984

[Paxson 97] Paxson, V., “Automated Packet Trace Analysis of TCP Implementations”, SIGCOMM 97

[Stevens 94] Stevens, W.R., TCP/IP Illustrated Volume 1: The Protocols. Reading, MA: Addison-Wesley,1994

[Stevens 98] Stevens, W.R., Unix Network Programming , Volume 1, Second Edition, 1998

[Sun 98] Sun Microsystems, TCP Slow Start Tuning For Solaris 2.6, http://www.sun.com/sun-on-net/performance/tcp.slowstart.html

[Nielsen 98] Nielsen, Henrik Frystyk, HTTP/1.1 and Nagle's Algorithm, 1998/04/29,http://www.w3.org/Protocols/HTTP/Performance/Nagle/

[Nielsen 97] Nielsen, Henrik Frystyk, et. al., Network Performance Effects of HTTP/1.1, CSS1, and PNG,SIGCOMM 97, http://www.w3.org/Protocols/HTTP/Performance/Pipeline.html

12