UNIT_IV

22
Transport Layer Process to Process Delivery The transport layer is responsible for process-to-process delivery of the entire message. A process is an application program running on a host. At the transport layer, we need a transport layer address, called a port number, to choose among multiple processes running on the destination host. The destination port number is needed for delivery; the source port number is needed for the reply. In the Internet model, the port numbers are 16-bit integers between 0 and 65,535. The client program defines itself with a port number, chosen randomly by the transport layer software running on the client host. This is the ephemeral port number. IANA Ranges The IANA (Internet Assigned Number Authority) has divided the port numbers into three ranges: well known, registered, and dynamic (or private) Well-known ports. The ports ranging from 0 to 1023 are assigned and controlled by lANA. These are the well-known ports. Registered ports. The ports ranging from 1024 to 49,151 are not assigned or controlled by lANA. They can only be registered with lANA to prevent duplication. Dynamic ports. The ports ranging from 49,152 to 65,535 are neither controlled nor registered. They can be used by any process. These are the ephemeral ports. Socket Addresses

description

cn

Transcript of UNIT_IV

Page 1: UNIT_IV

Transport Layer

Process to Process DeliveryThe transport layer is responsible for process-to-process delivery of the entire message. A process is an application program running on a host.

At the transport layer, we need a transport layer address, called a port number, to choose among multiple processes running on the destination host. The destination port number is needed for delivery; the source port number is needed for the reply. In the Internet model, the port numbers are 16-bit integers between 0 and 65,535. The client program defines itself with a port number, chosen randomly by the transport layer software running on the client host. This is the ephemeral port number.

IANA RangesThe IANA (Internet Assigned Number Authority) has divided the port numbers into three ranges: well known, registered, and dynamic (or private)

Well-known ports. The ports ranging from 0 to 1023 are assigned and controlled by lANA. These are the well-known ports.

Registered ports. The ports ranging from 1024 to 49,151 are not assigned or controlled by lANA. They can only be registered with lANA to prevent duplication.

Dynamic ports. The ports ranging from 49,152 to 65,535 are neither controlled nor registered. They can be used by any process. These are the ephemeral ports.

Socket AddressesProcess-to-process delivery needs two identifiers, IP address and the port number, at each end to make a connection. The combination of an IP address and a port number is called a socket address. The client socket address defines the client process uniquely just as the server socket address defines the server process uniquely

Multiplexing and DemultiplexingThe addressing mechanism allows multiplexing and demultiplexing by the transport Layer,

Page 2: UNIT_IV

MultiplexingAt the sender site, there may be several processes that need to send packets. However, there is only one transport layer protocol at any time. This is a many-to-one relationship and requires multiplexing. The protocol accepts messages from different processes, differentiated by their assigned port numbers. After adding the header, the transport layer passes the packet to the network layer.DemultiplexingAt the receiver site, the relationship is one-to-many and requires demultiplexing. The transport layer receives datagrams from the network layer. After error checking and dropping of the header, the transport layer delivers each message to the appropriate process based on the port number.

Connectionless Versus Connection-Oriented ServiceA transport layer protocol can either be connectionless or connection-oriented.Connectionless ServiceIn a connectionless service, the packets are sent from one party to another with no need for connection establishment or connection release. The packets are not numbered; they may be delayed or lost or may arrive out of sequence. There is no acknowledgment either. We will see shortly that one of the transport layer protocols in the Internet model, UDP, is connectionless.Connection Oriented ServiceIn a connection-oriented service, a connection is first established between the sender and the receiver. Data are transferred. At the end, the connection is released. We will see shortly that TCP and SCTP are connection-oriented protocols.

Reliable Versus UnreliableThe transport layer service can be reliable or unreliable. If the application layer program needs reliability, we use a reliable transport layer protocol by implementing flow and error control at the transport layer. This means a slower and more complex service. On the other hand, if the application program does not need reliability because it uses its own flow and error control mechanism or it needs fast service or the nature of the service does not demand flow and error control (real-time applications), then an unreliable protocol can be used. In the Internet, there are three common different transport layer protocols, as we have already mentioned. UDP is connectionless and unreliable; TCP and SCTP are connection oriented and reliable. These three can respond to the demands of the application layer programs.

USER DATAGRAM PROTOCOL (UDP)The User Datagram Protocol (UDP) is called a connectionless, unreliable transport protocol. It does not add anything to the services of IP except to provide process-toprocess communication instead of host-to-host communication. Also, it performs very limited error checking.UDP is a very simple protocol using a nrinimum of overhead. If a process wants to send a small message and does not care much about reliability, it can use UDP. Sending a small message by using UDP takes much less interaction between the sender and receiver than using TCP or SCTP.Well-Known Ports for UDP

7 – Echo9 – Discard11 – Users13 – Daytime69 – TFTP

Page 3: UNIT_IV

Source port number. This is the port number used by the process running on the source host. It is 16 bits long, which means that the port number can range from 0 to 65,535. If the source host is the client (a client sending a request), the port number, in most cases, is an ephemeral port number requested by the process and chosen by the UDP software running on the source host. If the source host is the server (a server sending a response), the port number, in most cases, is a well-known port number.Destination port number. This is the port number used by the process running on the destination host. It is also 16 bits long. If the destination host is the server (a client sending a request), the port number, in most cases, is a well-known port number. If the destination host is the client (a server sending a response), the port number, in most cases, is an ephemeral port number. In this case, the server copies the ephemeral port number it has received in the request packet.Length. This is a 16-bit field that defines the total length of the user datagram, header plus data. The 16 bits can define a total length of 0 to 65,535 bytes. However, the total length needs to be much less because a UDP user datagram is stored in an IP datagram with a total length of 65,535 bytes.UDP length = IP length - IP header's length.Checksum. This field is used to detect errors over the entire user datagram (header plus data).The UDP checksum calculation is different from the one for IP and ICMP. Here the checksum includes three sections: a pseudoheader, the UDP header, and the data coming from the application layer.The pseudoheader is the part of the header of the IP packet in which the user datagram is to be encapsulated with some fields filled with 0s.

UDP Operation

Connectionless ServicesAs mentioned previously, UDP provides a connectionless service. This means that each user datagram sent by UDP is an independent datagram. There is no relationship between the different user datagrams even if they are coming from the same source process and going to the same destination program. The user datagrams are not numbered. Also, there is no connection establishment and no connection termination, as is the case for TCP. This means that each user datagram can travel on a different path.Flow and Error ControlUDP is a very simple, unreliable transport protocol. There is no flow control and hence no window mechanism. The receiver may overflow with incoming messages. There is no error control mechanism in UDP except for the checksum. This means that the sender does not know if a message has been lost or duplicated. When the receiver detects an error through the checksum, the user datagram is silently discarded. The lack of flow control and error control means that the process using UDPshould provide these mechanisms.Encapsulation and DecapsulationTo send a message from one process to another, the UDP protocol encapsulates and decapsulates messages in an IP datagram.

Queuing

Page 4: UNIT_IV

In UDP, queues are associated with ports. At the client site, when a process starts, it requests a port number from the operating system. Some implementations create both an incoming and an outgoing queue associated with each process. Other implementations create only an incoming queue associatedwith each process. Note that even if a process wants to communicate with multiple processes, itobtains only one port number and eventually one outgoing and one incoming queue. The queues opened by the client are, in most cases, identified by ephemeral port numbers. The queues function as long as the process is running. When the process terminates, the queues are destroyed. The client process can send messages to the outgoing queue by using the source port number specified in the request. UDP removes the messages one by one and, after adding the UDP header, delivers them to IP. An outgoing queue can overflow. If this happens, the operating system can ask the client process to wait before sending any more messages. When a message arrives for a client, UDP checks to see if an incoming queue has been created for the port number specified in the destination port number field of the user datagram. If there is such a queue, UDP sends the received user datagram to the end of the queue. If there is no such queue, UDP discards the user datagram and asks the ICMP protocol to send a port unreachable message to the server. All the incoming messages for one particular client program, whether coming from the same or a differentserver, are sent to the same queue. An incoming queue can overflow. If this happens, UDP drops the user datagram and asks for a port unreachable message to be sent to the server.

Use of UDPThe following lists some uses of the UDP protocol:

UDP is suitable for a process that requires simple request-response communication with little concern for flow and error control. It is not usually used for a process such as FTP that needs to send bulk data UDP is suitable for a process with internal flow and error control mechanisms. For example, the Trivial File Transfer Protocol (TFTP) process includes flow and error control. It can easily use UDP.

UDP is a suitable transport protocol for multicasting. Multicasting capability is embedded in the UDP software but not in the TCP software.

UDP is used for management processes such as SNMP UDP is used for some route updating protocols such as Routing Information Protocol

(RIP)

Transmission Control Protocol (TCP)

Page 5: UNIT_IV

RELIABLE BYTE STREAM

TCP is called a connection-oriented, reliable transport protocol. It adds connection-oriented and reliability features to the services of IP.

TCP Services

Process-to-Process CommunicationLike UDP, TCP provides process-to-process communication using port numbers.

Well-known port numbers used by TCP

20 FIP, Data File Transfer Protocol (data connection)21 FIP, Control File Transfer Protocol (control connection)23 TELNET Tenninal Network25 SMTP Simple Mail Transfer Protocol53 DNS Domain Name Server67 BOOTP Bootstrap Protocol80 HTTP Hypertext Transfer Protocol111 RPC Remote Procedure Call

Stream Delivery ServiceTCP allows the sending process to deliver data as a stream of bytes and allows the receiving process to obtain data as a stream of bytes. TCP creates an environment in which the two processes seem to be connected by an imaginary "tube" that carries their data across the Internet.

End-to-End Issues

TCP supports logical connections between processes that are running on any two computers in the Internet. This means that TCP needs an explicit connection establishment phase and an explicit connection teardown phase. TCP connections are likely to have widely different round-trip times.TCP assumes that each packet has a maximum lifetime. The exact lifetime, known as the maximum segment lifetime (MSL), is an engineering choice. The amount of resources dedicated to any one TCP connection highly variable, especially considering that any one host can potentially support hundreds of TCP connections at the same time. The sending side of a TCP connection has no idea what links will be traversed to reach the destination.

Segment Format

Page 6: UNIT_IV

Source port address. This is a 16-bit field that defines the port number of the application program in the host that is sending the segment. This serves the same purpose as the source port address in the UDP header.Destination port address. This is a 16-bit field that defines the port number of the application program in the host that is receiving the segment. This serves the same purpose as the destination port address in the UDP header.Sequence number. This 32-bit field defines the number assigned to the first byte of data contained in this segment. As we said before, TCP is a stream transport protocol. To ensure connectivity, each byte to be transmitted is numbered. The sequence number tells the destination which byte in this sequence comprises the first byte in the segment. During connection establishment, each party uses a random number generator to create an initial sequence number (ISN), which is usually different in each direction.Acknowledgment number. This 32-bit field defines the byte number that the receiver of the segment is expecting to receive from the other party. If the receiver of the segment has successfully received byte number x from the other party, it defines x+1 as the acknowledgment number. Acknowledgment and data can be piggybacked together.Header length. This 4-bit field indicates the number of 4-byte words in the TCP header. The length of the header can be between 20 and 60 bytes. Therefore, the value of this field can be between 5 (5 x 4 =20) and 15 (15 x 4 =60).Reserved. This is a 6-bit field reserved for future use.Flag DescriptionURG The value of the urgent pointer field is valid.ACK The value of the acknowledgment field is valid.PSH Push the data.RST Reset the connection.SYN Synchronize sequence numbers during connection.FIN Terminate the connection.Window size. This field defines the size of the window, in bytes, that the other party must maintain. Note that the length of this field is 16 bits, which means that the maximum size of the window is 65,535 bytes. This value is normally referred to as the receiving window (rwnd) and is determined by the receiver. The sender must obey the dictation of the receiver in this case.Checksum. This 16-bit field contains the checksum. The calculation of the checksum for TCP follows the same procedure as the one described for UDP. However, the inclusion of the checksum in the UDP datagram is optional, whereas the inclusion of the checksum for TCP is mandatory.Urgent pointer. This l6-bit field, which is valid only if the urgent flag is set, is used when the segment contains urgent data. It defines the number that must be added to the sequence number to obtain the

Page 7: UNIT_IV

number of the last urgent byte in the data section of the segment. This will be discussed later in this chapter.Options. There can be up to 40 bytes of optional information in the TCP header.

Connection ManagementConnection Establishment and TerminationA TCP connection begins with a client (caller) doing an active open to a server (callee). Assuming that the server had earlier done a passive open, the two sides engage in an exchange of messages to establish the connection. Only after this connection establishment phase is over do the two sides begin sending data. Likewise, as soon as a participant is done sending data, it closes one direction of the connection, which causes TCP to initiate a round of connection termination messages.Three-Way HandshakeThe algorithm used by TCP to establish and terminate a connection is called a three-way handshake. The three-way handshake involves the exchange of three messages between the client and the server The idea is that two parties want to agree on a set of parameters, which, in the case of opening a TCP connection, are the starting sequence numbers the two sides plan to use for their respective byte streams. In general,the parameters might be any facts that each side wants the other to know about. First, the client (the active participant) sends a segment to the server (the passive participant) stating the initial sequence number itplans to use (Flags = SYN, SequenceNum = x). The server then responds with a single segment that both acknowledges the client’s sequence number (Flags = ACK, Ack = x+1) and states its own beginning sequence number (Flags = SYN, SequenceNum = y). That is, both the SYN and ACK bits are set in the Flags field of this second message. Finally, the client responds with a third segment that acknowledges the server’s sequence number (Flags = ACK, Ack = y +1).

State-Transition DiagramTCP is complex enough that its specification includes a state-transition diagram. This diagramshows only the states involved in opening a connection (everything above ESTABLISHED) and in closing a connection (everything below ESTABLISHED). Each circle denotes a state that one end of a TCP connection can find itself in. All connections start in the CLOSED state. As the connection progresses, the connection moves from state to state according to the arcs. Each arc is labeled with a tag of the form event/action. Thus, if a connection is in the LISTEN state and a SYN segment arrives (i.e., a segment with the SYN flag set), the connection makes a transition to the SYN RCVD state and takesthe action of replying with an ACK+SYN segment.

Page 8: UNIT_IV

Two kinds of events trigger a state transition: (1) a segment arrives from the peer (e.g., the event on the arc from LISTEN to SYN RCVD), or (2) the local application process invokes an operation on TCP (e.g., the active open event on the arc from CLOSED to SYN SENT). The arrival of segment causes the client to move to the ESTABLISHED state and to send an ACK back to the server. When this ACK arrives, the server finally moves to the ESTABLISHED state. On any one side there are three combinations of transitions that get a connection from the ESTABLISHED state to the CLOSED state:

This side closes first: ESTABLISHED→FIN WAIT 1→FIN WAIT 2 →TIME WAIT→CLOSED.

The other side closes first: ESTABLISHED→CLOSE WAIT→ LAST ACK→CLOSED. Both sides close at the same time: ESTABLISHED→FIN WAIT 1→ CLOSING→TIME

WAIT→CLOSED.

Flow ControlReliable and Ordered DeliveryTCP on the sending side maintains a send buffer. This buffer is used to store data that has been sent but not yet acknowledged, as well as data that has been written by the sending application but not transmitted. On the receiving side, TCP maintains a receive buffer. This buffer holds data that arrives out of order, as well as data that is in the correct order (i.e., there are no missing bytes earlier in the stream) but that the application process has not yet had the chance to read.

Page 9: UNIT_IV

In a sliding window protocol, the size of the window sets the amount of data that can be sent without waiting for acknowledgment from the receiver. Thus, the receiver throttles the sender by advertising a window that is no larger than the amount of data that it can buffer. Observe that TCP on the receive side must keep LastByteRcvd−LastByteRead ≤ MaxRcvBuffer to avoid overflowing its buffer. It therefore advertises a window size of AdvertisedWindow = MaxRcvBuffer−((NextByteExpected−1) −LastByteRead) which represents the amount of free space remaining in its buffer.TCP on the send side must then adhere to the advertised window it gets from the receiver. This means that at any given time, it must ensure that LastByteSent−LastByteAcked ≤ AdvertisedWindowSaid another way, the sender computes an effective window that limits how much data it can send:EffectiveWindow = AdvertisedWindow−(LastByteSent−LastByteAcked)Clearly, EffectiveWindow must be greater than 0 before the source can send more data. All the while this is going on, the send side must also make sure that the local application process does not overflow the send buffer—that is, thatLastByteWritten−LastByteAcked ≤ MaxSendBufferIf the sending process tries to write y bytes to TCP, but(LastByteWritten−LastByteAcked)+y > MaxSendBufferthen TCP blocks the sending process and does not allow it to generate more data.

Adaptive RetransmissionBecause TCP guarantees the reliable delivery of data, it retransmits each segment if an ACK is not received in a certain period of time. TCP sets this timeout as a function of the RTT it expects between the two ends of the connection. Unfortunately, given the range of possible RTTs between any pair of hosts in the Internet, as well as the variation in RTT between the same two hosts over time, choosing an appropriate timeout value is not that easy. To address this problem, TCP uses an adaptive retransmission mechanism. We now describe this mechanism and how it has evolved over time as the Internet community has gained more experience using TCP.

Original AlgorithmThe idea is to keep a running average of the RTT and then to compute the timeout as a function of this RTT. Specifically, every time TCP sends a data segment, it records the time. When an ACK for that segment arrives, TCP reads the time again, and then takes the difference between these two times as a SampleRTT. TCP then computes an EstimatedRTT as a weighted average between the previous estimate and this new sample. That is,EstimatedRTT = α×EstimatedRTT+(1−α)×SampleRTT

Page 10: UNIT_IV

The parameter α is selected to smooth the EstimatedRTT. A small α tracks changes in the RTT but is perhaps too heavily influenced by temporary fluctuations. On the other hand, a large α is more stable but perhaps not quick enough to adapt to real changes. The original TCP specification recommended a setting of α between 0.8 and 0.9. TCP then uses EstimatedRTT to compute the timeout in a rather conservative way:TimeOut = 2×EstimatedRTT

Karn/Partridge AlgorithmThe problem was that an ACK does not really acknowledge a transmission; it actually acknowledges the receipt of data. RTT. It is necessary to know which transmission to associate it with so as to compute an accurate SampleRTT. As illustrated in diagram if you assume that the ACK is for the original transmission but it was really for the second, then the SampleRTT is too large (a); if you assume that the ACK is for the second transmission but it was actually for the first, then the SampleRTT is too small (b).

The solution, which was proposed in 1987, is simple. Whenever TCP retransmits a segment, it stops taking samples of the RTT; it only measures SampleRTT for segments that have been sent only once.This solution is known as the Karn/Partridge algorithm.

Jacobson/Karels AlgorithmIn the new approach, the sender measures a new SampleRTT as before. It then folds this new sample into the timeout calculation as follows:Difference = SampleRTT−EstimatedRTTEstimatedRTT = EstimatedRTT+( δ×Difference)Deviation = Deviation+ δ (|Difference|−Deviation)Where δ is a fraction between 0 and 1. That is, we calculate both the meanRTT and the variation in that mean.TCP then computes the timeout value as a function of both EstimatedRTT and Deviation as follows:TimeOut = μ×EstimatedRTT+ϕ×Deviationwhere based on experience, μ is typically set to 1 and ϕ is set to 4. Thus, when the variance is small, TimeOut is close to EstimatedRTT; a large variance causes the Deviation term to dominate the calculation.

CongestionAn important issue in a packet-switched network is congestion. Congestion in a network may occur if the load on the network-the number of packets sent to the network-is greater than the capacity of the network-the number of packets a network can handle.Congestion control refers to the mechanisms and techniques to control the congestion and keep the load below the capacity.

Page 11: UNIT_IV

Backpressure is a node-to-node congestion control that starts with a node and propagates, in the opposite direction of data flow, to the source.A choke packet is a packet sent by a node to the source to inform it of congestion. The warning is from the router, which has encountered congestion, to the source station directly.In implicit signaling, there is no communication between the congested node or nodes and the source. The source guesses that there is a congestion somewhere in the network from other symptoms.Explicit SignalingThe node that experiences congestion can explicitly send a signal to the source or destination. Backward Signaling A bit can be set in a packet moving in the direction opposite to the congestion. This bit can warn the source that there is congestion and that it needs to slow down to avoid the discarding of packets.Forward Signaling A bit can be set in a packet moving in the direction of the congestion. This bit can warn the destination that there is congestion. The receiver in this case can use policies, such as slowing down the acknowledgments, to alleviate the congestion.

TCP CONGESTION CONTROLThe essential strategy of TCP is to send packets into the network without a reservation and then to react to observable events that occur. TCP assumes only FIFO queuing in the network’s routers, but also works with fair queuing. Congestion collapse—hosts would send their packets into the Internet as fast as the advertised window would allow, congestion would occur at some router (causing packets to be dropped), and the hosts would time out and retransmit their packets, resulting in even more congestion.

TCP congestion control is for each source to determine how much capacity is available in the network, so that it knows how many packets it can safely have in transit. Once a given source has this many packets in transit, it uses the arrival of an ACK as a signal that one of its packets has left the network and that it is therefore safe to insert a new packet into the network without adding to the level of congestion. By using ACKs to pace the transmission of packets, TCP is said to be self-clocking.

Additive Increase/Multiplicative DecreaseTCP maintains a new state variable for each connection, called CongestionWindow, which is used by the source to limit how much data it is allowed to have in transit at a given time. The congestion windowis congestion control’s counterpart to flow control’s advertised window.TCP’s effective window is revised as follows:MaxWindow = MIN(CongestionWindow,AdvertisedWindow)EffectiveWindow = MaxWindow-(LastByteSent-LastByteAcked).That is, MaxWindow replaces AdvertisedWindow in the calculation of EffectiveWindow. Thus, a TCP source is allowed to send no faster than the slowest component—the network or the destination host—can

Page 12: UNIT_IV

accommodate. TCP source sets the CongestionWindow based on the level of congestion it perceives to exist in the network. This involves decreasing the congestion window when the level of congestion goes up and increasing the congestion window when the level of congestion goes down. Taken together, the mechanism is commonly called additive increase/multiplicative decrease (AIMD) TCP interprets timeouts as a sign of congestion and reduces the rate at which it is transmitting. Specifically, each time a timeout occurs, the source sets CongestionWindow to half of its previous value. This halving of the CongestionWindow for each timeout corresponds to the “multiplicative decrease” part of AIMD.CongestionWindow is not allowed to fall below the size of a single packet, or in TCP terminology, the maximum segment size (MSS) Every time the source successfully sends a CongestionWindow’s worth of packets—that is, each packet sent out during the last round-trip time (RTT) has been ACKed—it adds the equivalent of 1 packet to CongestionWindow. This linear increase is illustrated as “additive increase”.

congestion window is incremented as follows each time an ACK arrives:Increment = MSS×(MSS/CongestionWindow)CongestionWindow+ = Increment

Slow StartSlow start is used to increase the congestion window rapidly from a cold start. Slow start effectively increases the congestion window exponentially, rather than linearly. Specifically, the source starts out by setting CongestionWindow to one packet. When the ACK for this packet arrives, TCP adds 1 to CongestionWindow and then sends two packets. Upon receiving the corresponding two ACKs, TCP increments CongestionWindow by 2—one for each ACK—and next sends four packets. The end result is that TCP effectively doubles the number of packets it has in transit every RTT.

There are actually two different situations in which slow start runs. The first is at the very beginning of a connection, at which time the source has no idea how many packets it is going to be able to have in transit

Page 13: UNIT_IV

at a given time. In this situation, slow start continues to double CongestionWindow each RTT until there is a loss, at which time a timeout causes multiplicative decrease to divide CongestionWindow by 2.The second situation in which slow start is used is a bit more subtle; it occurs when the connection goes dead while waiting for a timeout to occur. the source then uses slow start to restart the flow of data rather than dumping a whole window’s worth of data on the network all at once. Although the source is using slow start again, it now knows more information than it did at the beginning of a connection.quick-start mechanism is that a TCP sender can ask for an initial sending rate greater than slow start would allowby putting a requested rate in its SYN packet as an IP option. Routers along the path can examine the option, evaluate the current level of congestion on the outgoing link for this flow, and decide if that rate is acceptable, if a lower rate would be acceptable, or if standard slow start should be used. By the time the SYN reaches the receiver, it will contain either a rate that was acceptable to all routers on the path or an indication that one or more routers on the path could not support the quick-start request. In the former case, the TCP sender uses that rate to begin transmission; in the latter case, it falls back to standard slow start.

Fast Retransmit and Fast RecoveryFast retransmit is a heuristic that sometimes triggers the retransmission of a dropped packet sooner than the regular timeout mechanism. The fast retransmit mechanism does not replace regular timeouts; it justenhances that facility. The idea of fast retransmit is straightforward. Every time a data packet arrives at the receiving side, the receiver responds with an acknowledgment, even if this sequence number has already been acknowledged. Thus, when a packet arrives out of order—when TCP cannot yet acknowledge the data the packet contains because earlier data has not yet arrived—TCP resends the same acknowledgment it sent the last time. This second transmission of the same acknowledgment is called a duplicate ACK. When the sending side sees a duplicate ACK, it knows that the other side must have received a packet out of order, which suggests that an earlier packet might have been lost. Since it is also possible that the earlier packet has only been delayed rather than lost, the sender waits until it sees some number of duplicate ACKs and then retransmits the missing packet. In practice, TCP waits until it has seen three duplicate ACKs before retransmitting the packet.

When the fast retransmit mechanism signals congestion, rather than drop the congestion window all the way back to one packet and run slow start, it is possible to use the ACKs that are still in the pipe to clock the sending of packets. This mechanism, which is called fast recovery, effectively removes the slow start phase that happens between when fast retransmit detects a lost packet and additive increase begins.

Page 14: UNIT_IV

CONGESTION-AVOIDANCE MECHANISMSTo predict when congestion is about to happen and then to reduce the rate at which hosts send data just before packets start being discarded. We call such a strategy congestion avoidance, to distinguish it from congestion control.

DECbitThe first mechanism was developed for use on the Digital Network Architecture (DNA), a connectionless network with a connection-oriented transport protocol. This mechanism could, therefore, also be applied to TCP and IP.Each router monitors the load it is experiencing and explicitly notifies the end nodes when congestion is about to occur. This notification is implemented by setting a binary congestion bit in the packets that flowthrough the router, hence the name DECbit. The destination host then copies this congestion bit into the ACK it sends back to the source. Finally, the source adjusts its sending rate so as to avoid congestion. The following discussion describes the algorithm in more detail, starting with what happens in the router.A single congestion bit is added to the packet header. A router sets this bit in a packet if its average queue length is greater than or equal to 1 at the time the packet arrives. This average queue length is measuredover a time interval that spans the last busy+idle cycle, plus the current busy cycle.

Random Early Detection (RED)A second mechanism, called random early detection (RED), is similar to the DECbit scheme in that each router is programmed to monitor its own queue length and, when it detects that congestion is imminent, to notify the source to adjust its congestion window. RED is most commonly implemented such that itimplicitly notifies the source of congestion by dropping one of its packets. The source is, therefore, effectively notified by the subsequent timeout or duplicate ACK. As the “early” part of the RED acronym suggests, the gateway drops the packet earlier than it would have to, so as to notify the source that it should decrease its congestion windows ooner than it would normally have The second difference between RED and DECbit is in the details of how RED decides when to drop a packet and what packet it decides to drop. RED computes an average queue length using a weighted running average similar to the one used in the original TCP timeout computation. That is, AvgLen is computed asAvgLen = (1−Weight)×AvgLen+Weight×SampleLen where 0 < Weight < 1 and SampleLen is the length of the queue when a sample measurement is made. RED has two queue length thresholds that trigger certain activity: MinThreshold and MaxThreshold. When a packet arrives at the gateway, RED compares the current AvgLen with these two thresholds, according to the following rules:if AvgLen _ MinThresholdqueue the packetif MinThreshold < AvgLen < MaxThresholdcalculate probability P

Page 15: UNIT_IV

drop the arriving packet with probability Pif MaxThreshold ≤ AvgLendrop the arriving packetIf the average queue length is smaller than the lower threshold, no action is taken, and if the average queue length is larger than the upper threshold, then the packet is always dropped. If the average queue length is between the two thresholds, then the newly arriving packet is dropped with some probability P.

Quality of Service

Application RequirementsFour types of characteristics are attributed to a flow: reliability, delay, jitter, and bandwidth

ReliabilityReliability is a characteristic that a flow needs. Lack of reliability means losing a packet or acknowledgment, which entails retransmission.DelaySource-to-destination delay is another flow characteristic. Again applications can tolerate delay in different degrees. In this case, telephony, audio conferencing, video conferencing, and remote log-in need minimum delay, while delay in file transfer or e-mail is less important.JitterJitter is the variation in delay for packets belonging to the same flow. For example, if four packets depart at times 0, 1, 2, 3 and arrive at 20, 21, 22, 23, all have the same delay, 20 units of time. On the other hand, if the above four packets arrive at 21, 23, 21, and 28, they will have different delays: 21,22, 19, and 24.For applications such as audio and video, the first case is completely acceptable; the second case is not. For these applications, it does not matter if the packets arrive with a short or long delay as long as the delay is the same for all packets. For this application, the second case is not acceptable. Jitter is defined as the variation in the packet delay. High jitter means the difference between delays is large; low jitter means the variation is small.BandwidthDifferent applications need different bandwidths. In video conferencing we need to send millions of bits per second to refresh a color screen while the total number of bits in an e-mail may not reach even a million.

Approaches to QoS SupportConsidering this rich space of application requirements, what we need is a richer service model that meets the needs of any application. This leads us to a service model with not just one class (best effort), but with several classes, each available to meet the needs of some set of applications. Towards this end, we are now ready to look at some of the approaches that have been developed to provide a range of qualities of service. These can be divided into two broad categories:

Fine-grained approaches, which provide QoS to individual applications or flows Coarse-grained approaches, which provide QoS to large classes of data or aggregated traffic

In the first category, we find Integrated Services, a QoS architecture developed in the IETF and often associated with the Resource Reservation Protocol (RSVP); ATM’s approach to QoS was also in this category. In the second category lies Differentiated Services, which is probably the most widely deployed QoS mechanism at the time of writing.