The aim of this unit is to review the main concepts ...jamhour/Download/pub... · The first...

18
1 The aim of this unit is to review the main concepts related to TCP and UDP transport protocols, as well as application protocols. These concepts are important requirements for developing programs that communicates through an IP network. They are also import to understand the operation of Proxy and NAT, as well as the operation of packet filters, firewalls and other security mechanisms that will be covered later in this course.

Transcript of The aim of this unit is to review the main concepts ...jamhour/Download/pub... · The first...

Page 1: The aim of this unit is to review the main concepts ...jamhour/Download/pub... · The first difference between TCP and UDP refers to the presence or absence of connection. A connection

1

The aim of this unit is to review the main concepts related to TCP and UDP transport

protocols, as well as application protocols. These concepts are important requirements for

developing programs that communicates through an IP network. They are also import to

understand the operation of Proxy and NAT, as well as the operation of packet filters,

firewalls and other security mechanisms that will be covered later in this course.

Page 2: The aim of this unit is to review the main concepts ...jamhour/Download/pub... · The first difference between TCP and UDP refers to the presence or absence of connection. A connection

2

The TCP/IP architecture consists of three layers: Application, Transport and Network. The

protocols used in the TCP/IP architecture are standardized and published by an entity

called IETF (Internet Engineering Task Force). Documents generated by the IETF are

called RFC (Request for Comments) and describe in detail the operation of the protocols.

All RFCs are accessible for free in the www.ietf.org site. Lower layers (Data Link and

Physical) are not considered part of the TCP/IP architecture as they are defined by another

entity (usually the IEEE - Institute of Electrical and Electronics Engineers). The TCP/IP

architecture defines two transport protocols: TCP (Transmission Control Protocol) and

UDP (User Datagram Protocol).

As belong to the same layer, TCP and UDP protocols cannot be used at the same time. TCP

and UDP protocols are implemented by the operating system. This greatly simplifies the

development of applications running on the network, because the details of each protocol

can be hidden from the application.

It is up to the application to decide which transport protocol will be used. This is done

through a standard interface with the operating system called "sockets". This interface

defines a set of APIs (standard function calls) for mapping applications on port numbers

and sending and receiving packets. The choice of the transport protocol depends heavily on

the goals of the application, as will be discussed in the sequence of this unit.

Page 3: The aim of this unit is to review the main concepts ...jamhour/Download/pub... · The first difference between TCP and UDP refers to the presence or absence of connection. A connection

3

Before starting the discussion regarding the differences between TCP and UDP protocols,

lets discuss their similarities.The goal of both protocols is to provide a mechanism to

address processes in an operating system. As we have seen earlier in this course, this is

done by using 16-bit addresses, called number ports.

The way ports are mapped to the processes is defined by the socket interface. A process

(user level application) may or may not choose a port number when it starts using an API

called BIND. If the process is started without the BIND call, a random port (usually

between 1024 and 65535) will be assigned by the operating system. The operating system

ensures that a unique port is assigned to any process that communicates through the same

interface (IP address).

Usually, a client processes does not perform a bind operation. Server processes, by the

other hand, always do a bind because the port number cannot be random (since clients have

to address them). The port used by the server processes depends on the type of application

it represents. It belongs to the range of well-known ports (0-1023) for standard Internet

applications (such as web, email, and others) that requires “root” privileges to be executed.

Otherwise, they belong to the range of registered ports (1024-49151) for applications that

do not require root privileges, and are proprietary to specific vendors (such as Databases

Management Systems).

Page 4: The aim of this unit is to review the main concepts ...jamhour/Download/pub... · The first difference between TCP and UDP refers to the presence or absence of connection. A connection

4

The TCP and UDP protocols are very different. UDP is very simple, and virtually provides

only the service of addressing processes through port numbers. By the other hand, TCP is a

very sophisticated protocol that performs various operations for an application, such as

automatic confirmation of reception and automatic retransmission of lost packets.

The first difference between TCP and UDP refers to the presence or absence of connection.

A connection is established by an exchange of control packets between a client and the

server, and occurs before the first data packet is transmitted. The TCP uses control packets

to create, monitor, and terminate connections. A connection is a fundamental requirement

to perform many of the services offered by

TCP. UDP transmits data packets only.

The second difference refers to the way data is fragmented into packets. In TCP, an

application does not need to control how much data can fit in one packet. It can simply

transmit the data in a continuous flow (stream) of bytes, because the operating system

(O.S). decides when there are enough bytes to create packets. The OS on the receiver

reassembles packets transparently to the application that receives the data as a stream of

bytes. In the case of UDP, it is up to the application to provide to the O.S. an amount of

data that fits in a packet.

The third difference refers to the control of reception and retransmission of lost packets. In

the case of TCP, packets are confirmed by the receiver. If they are not confirmed, they are

automatically retransmitted by the O.S., without intervention of the application. In the case

of UDP, the detection and retransmission of lost packets, when required, must be

performed by the application.

Page 5: The aim of this unit is to review the main concepts ...jamhour/Download/pub... · The first difference between TCP and UDP refers to the presence or absence of connection. A connection

5

TCP implements two sophisticated mechanisms not implemented by UDP: flow control and congestion control.

Flow control is an automatic packet rate adjustment performed by the transmitter, which reduces its transmission rate to prevent packet loss on the receiver due to buffer overflow. This is necessary when receiver is not able to read the packets at the rate sent by the transmitter. The packets are first received by the O.S. (operating system) and stored in a buffer. If the application does not read the bytes from the buffer with enough speed, the result may be buffer overflow.

Congestion control is also a packet rate adjustment, but caused by packet loss in the network. When TCP detects packet loss, it assumes that routers had to drop packets because the network is congested. This mechanism was implemented in the early days of the Internet when it was realized that the automatic retransmission of packets without congestion control could lead to a fast collapse of the network.

The additional features offered by TCP over UDP have a cost: TCP only supports unicast transmissions. That is, you cannot use broadcast or multicast addresses on TCP connections. This happens, among other reasons, because the TCP packets need to be confirmed by the recipient. Confirmation is not possible when generic destination addresses are used because the sender does not know how many recipients it must wait for confirmation. All applications that need to transmit broadcast or multicast packets need to use UDP. TCP is also more costly in terms of use of C.P.U. for the O.S. and also in terms of the volume of data transmitted over the network. It is not indicated for applications that transmit only a few amount of packets or messages that cannot be delayed.

Page 6: The aim of this unit is to review the main concepts ...jamhour/Download/pub... · The first difference between TCP and UDP refers to the presence or absence of connection. A connection

6

The TCP PDU (protocol data unit) is called segment. The TCP header is shown in the

figure. In addition to the ports numbers of origin and destination, the remaining fields of

the TCP header are related functions provided by the protocol.

The Sequence Number and Acknowledgement Number fields are related to the mechanism

of reliable transmission.

The Receive Window field is related to the mechanism of flow control. The Flags field

contains a set of control bits used to control TCP connection and also the reliable

transmission process.

The Urgent Pointer field is rarely used in practice. It allows you to tell the receiver that

some data should be processed with more priority, passing in front of other data that is

already buffered waiting for processing. The Options field is not required and is often

omitted from the TCP header. The TCP header has a variable size because the options field

is optional. Therefore, HLEN field defines the size of the header in 4-byte words.

Page 7: The aim of this unit is to review the main concepts ...jamhour/Download/pub... · The first difference between TCP and UDP refers to the presence or absence of connection. A connection

7

When using TCP, the decision of when a segment is created and transmitted is done by the

protocol and not by the application. This strategy is called the flow (streaming)

transmission.Using the Sockets API, the application sends a continuous stream of bytes to

the operation system (O.S.). Each “send” API call does not necessarily generate a packet.

TCP may wait for a reasonable number of bytes in the transmission buffer, to avoid

generating too many packets of small size.

The package size is a compromise between minimizing the number of packets transmitted

and not causing excessive delay when the volume of data to be transmitted is small.

Theoretically, the maximum size of an IP packet is 64 Kbytes (less the size of the TCP

header). At first this would be about the amount of data that TCP should wait before

generating a packet.

In modern O.S., however, the amount of bytes that TCP accumulates is defined according

to the MTU of the network adaptor (1500 bytes for Ethernet). This is done to prevent

packets to be fragmented by the IP layer. The maximum size of a segment is called MSS

(Maximum Segment Size). It corresponds to 1460 bytes (1500 bytes - 20 bytes of IP header

- 20 bytes of the TCP header) in the Ethernet technology.

Page 8: The aim of this unit is to review the main concepts ...jamhour/Download/pub... · The first difference between TCP and UDP refers to the presence or absence of connection. A connection

8

In order to make the process of segmentation and reassembly transparent to applications,

the TCP header includes information necessary for the O.S. to reassembly the data in the

receiver in the same order it was sent by the transmitter. A TCP connection is identified by

four addresses: Source IP, Source Port Number, Destination IP, and Destination Port

Number.

As illustrated in the figure, after establishing a TCP connection, all segments are numbered

using the "Sequence Number" field. The sequence number indentifies the first byte in a

segment, but the first segment in a connection does not start in ZERO. Instead, an initial

sequence number (ISN) is chosen randomly. A different ISN is chosen for each connection.

If the same pair of computers (A, B) terminates a connection and starts another

immediately, another ISN is used. The value of ISN is also unidirectional, i.e., a ISN is

used for the packet flow from A to B and another from B to A.

The justification to use a random ISN is to avoid an erroneous packet reassembly when a

connection terminates and is immediately re-established using the same port numbers.

Without a random ISN , due to network delay, packets of the previously connection could

be confused with the packages of the new connection.

Page 9: The aim of this unit is to review the main concepts ...jamhour/Download/pub... · The first difference between TCP and UDP refers to the presence or absence of connection. A connection

9

TCP implements a reliable communication process called "retransmission in absence of

confirmation". As the figure shows, TCP the sequence number (SEQ) and confirmation

number (CONF) numbers to implement this strategy. The SEQ field always indicates the

first byte of the segment being transmitted. The CONF field indicates the next byte that the

sender expects to receive from its peer. The CONF field has the implied meaning of

confirming the receipt of all bytes preceding the CONF number. That is, if a peer sends a

segment with the CONF= 2000, it acknowledges the receipt of all bytes until 1999.

TCP does not necessary to send control packets only to confirm the receipt of data. It uses

a strategy in which the same segment used to transmit new data also confirms the bytes

already received. To illustrate this concept, assume that a client has to transmit two

segments: segment 1 (bytes 1000-1499) and segment 2 (bytes 1500-1799). The server also

has to transmit two segments: The segment A (bytes 2000-2099) and segment B (bytes

2100-2989). The client transmits the first segment (500 bytes) with SEQ=1000 and

CONF=2000. This means it is confirming to the server the receipt of all bytes until 2000.

The server responds with SEQ=2000 and CONF=1500. The SEQ field exactly matches the

next byte expected by the client and the CONF field confirms the receipt of the bytes

corresponding to segment 1. The process continues as indicate in the figure.

Page 10: The aim of this unit is to review the main concepts ...jamhour/Download/pub... · The first difference between TCP and UDP refers to the presence or absence of connection. A connection

10

The retransmission technique used by TCP is based in a positive acknowledgement with

temporization. TCP has no error messages. If an acknowledgment does not arrive at the

transmitter in a given time, the segment is retransmitted. The receiver can send packets

without data, only with confirmation, when it has nothing to transmit. The maximum time

to wait for an acknowledgment is estimated based on the average Round-Trip Time (RTT)

to send and confirm a segment. The transmitter can adopt several techniques to estimate

the RTT. A common strategy is as follows:

EstimatedRTT = 0.875 EstimatedRTT + 0.125 SampleRTT

Timer = EstimatedRTT + 4 . Deviation

Deviation= 0.875 Desvio + 0.125 (SampleRTT – EstimatedRTT)

where:

SampleRTT: last measure of RTT

Temporizador: maximum time to wait a confirmation

Deviation: a measure of the fluctuation of the RTT

The receiver does not confirm segments received out of order. Instead, for each segment

received out of order, the receiver repeats the confirmation number of the last segment

received in the correct sequence. If the transmitter receives three segments with the same

acknowledgment number, it retransmits all segments not confirmed yet. This technique is

called fast retransmission (retransmission before timeout of the retransmission timer). The

figure presents a summary of key recommendations on the operation of TCP, described in

RFCs 1122 and 2581.

Page 11: The aim of this unit is to review the main concepts ...jamhour/Download/pub... · The first difference between TCP and UDP refers to the presence or absence of connection. A connection

11

ACK, SYN and FIN bits (defined in the FLAGS field of the TCP header) are used to

control the opening and closing of TCP connections. The ACK flag is the confirmation of

receipt. It is always ZERO in the first segment sent by the client (because there is nothing

to confirm) and 1 everywhere else. The SYN flag controls the ISN synchronization. It is

ONE in the first two segments exchanged between the client and the server, and zero

everywhere else. The FIN flag is the termination flag. It is ONE to indicate that a

connection must be terminated.

The beginning of a TCP connection defines the ISN (Initial Sequence Numbers) used by

the client and the server. This involves the exchange of three segments:

1) The client sends a request to open a connection (SYN segment). This segment defines

the initial value of the sequence number of the client (C_ISN), and is identified by the

flags SYN=1 and ACK=0.

2) The server confirms the connection (SYNACK segment). This segment reports the ISN

of the server (S_ISN), and is identified by the flags SYN=1 and ACK=1.

3) The client sends the confirmation of receipt of SYNACK segment. After this stage,

data can be exchanged indefinitely between client and server. Note that during the

exchange of data SYN=0 and ACK=1.

A connection may be closed by the initiative of the client or the server. A TCP connection

is bidirectional, so closing a connection requires both client and server to send termination

requests. Closing a connection needs four segments. In the example, the client initiates the

procedure to close a connection. The client sends a segment with FIN=1. The server

Page 12: The aim of this unit is to review the main concepts ...jamhour/Download/pub... · The first difference between TCP and UDP refers to the presence or absence of connection. A connection

12

The reception process is transparent to the application, since TCP is implemented by the

O.S. When a TCP segment is transmitted, it is first received by operating system (O.S.) and

stored in a buffer. However, this buffer has limited capacity. The receiver application must

be able to remove the data from the buffer at a rate compatible with the rate of the

transmitter. If the receiver is very slow (or the application is poorly written), the receiver

buffer may be overloaded. When the buffer is full, the O.S. discard all segments received.

In this condition, the automatic retransmission of lost packets will probably worse the

situation, for both, the receiver and the network.

Flow control is a TCP mechanism that prevents this from happening. According to this

mechanism, the receiver informs along with any segment confirmation the amount of

buffer that it still has available using the Receive Window (RcvWindow) field of the TCP

header.

To illustrate how flow control works, consider the scenario of the figure, where computer A

is the transmitter and computer B is the receiver. The algorithm for calculating the Receive

Window uses three parameters:

RcvBuffer = reception buffer of B

LastByteRead = last by read by the B application

LastByteRcvd = last by received by the B O.S.

The receive window sent from B to A is defined as:

RcvWindow = RcvBuffer - [LastByteRcvd - LastByteRead]

Page 13: The aim of this unit is to review the main concepts ...jamhour/Download/pub... · The first difference between TCP and UDP refers to the presence or absence of connection. A connection

13

In practice, TCP requires another window which limits the transmission rate. This window

is called Congestion Window (CongWin). Unlike the Receive Window, the CongWin does

not have a corresponding field in the TCP header. It is calculated internally by the O.S. of

the transmitter based on the success or failure of the segment transmissions. Every time a

segment is lost, TCP assumes that the network is congested and tries to reduce the rate of

the transmitter. When segments are transmitted successfully, the CongWin is

increased. The CongWin is calculated in multiples of MSS (Maximum Segment Size =

1460 bytes).

The figure illustrates how the CongWin evolves over time. Initially, the window is set to 1

MSS and it is doubled at every segment successfully confirmed. This process is called

“exponential growth” and continues until a certain Threshold is achieved. From that point,

the CongWin enters in a “congestion avoidance” phase, where the growth is slower (just a

1 MSS at every successful confirmation). In case of failure, the CongWin and Threshold

are reduced by half.

The algorithm to compute the CongWin can be summarized as follows:

a) Initialization:

CongWin = 1 MSS (Maximum Segment Size = 1460 bytes)

Threshold = 65 kbps

b) Exponential Growth Phase:

At each successful segment aknowledge:

if CongWin < Threshold : CongWin = CongWin + MSS

i.e., CongWin= Congwin*2 per RTT

Otherwise go to congestion avoidance: CongWin = CongWin + (MSS/CongWin)

i.e., CongWin = CongWin + 1 MSS per RTT

c) When a segment is received out of order:

Page 14: The aim of this unit is to review the main concepts ...jamhour/Download/pub... · The first difference between TCP and UDP refers to the presence or absence of connection. A connection

14

At a given instant, the maximum transmission rate is given by the smallest window defined

by flow control and congestion control. This maximum transmission rate is calculated as

follows:

max-rate = [ min ( CongWindow , RcvWindow ) - ( LastByteSent - LastByteAcked ) ] /

RTT bytes / s .

where :

LastByteSent : last byte sent by the transmitter

LastByteAcked : last byte confirmed by receiver

TCP consider two types of failures: “segments lost” (i.e., the receiver does not send any

acknowledgment) and “segments out of order” (i.e., the receiver sends duplicate

acknowledgments). The first event is considered more severe than the latter, because out of

order segments means that some segments are still being received.

There are some variations in the implementation of TCP that differ in the way the

congestion control mechanism reacts to these failure events.

The Tahoe version is the oldest, and returns to slow start (CongWin = 1MSS) for any type

of failure event.

The Reno version is more recent, and takes a quick recovery (CongWin=CongWin/2) in

the case of out of order segments, and slow start (CongWin = 1MSS) in case of segments

lost.

Page 15: The aim of this unit is to review the main concepts ...jamhour/Download/pub... · The first difference between TCP and UDP refers to the presence or absence of connection. A connection

15

OThe UDP (User Datagram Protocol) header is much simpler than TCP because it does not

offer any functionality beyond port number addressing. The PDU of UDP is called

datagram, which is also a frequent synonym for packet. “UDP” means that packets are

created at user (application) level.

Despite having a field for error checking (CheckSum), UDP offers no confirmation service

to the transmitter, or retransmission of lost packets. It doesn’t use connections, being

unable to segment and reassemble data in a transparent way to the application level. That

does not mean it is not possible to develop applications over UDP that are reliable or

capable of transmitting large volumes of data. It simply means that these additional

features should be embedded in the application level, because they are not offered by the

operating system. For example, NFS (Network File System) that allows Unix systems to

share directories over the network is built on UDP.

In many cases, developers choose not to use the features of TCP due to performance issues.

This is particularly true for delay and jitter sensitive applications (i.e., real-time

applications). For example, for VoIP (Voice over IP) is not worth retransmitting lost

packets, since the ability to re-order packets at the VoIP terminal is limited. UDP is still

essential in cases of applications that need to transmit messages to multicast or broadcast

addresses because TCP only supports unicast mode.

Page 16: The aim of this unit is to review the main concepts ...jamhour/Download/pub... · The first difference between TCP and UDP refers to the presence or absence of connection. A connection

16

Code sharing at the application layer is much harder than in transport layer. While TCP,

UDP and IP protocols offer the possibility of reusing the same code among different

applications, application protocols are too specialized to be shared. Thus, it is not worth

implementing application protocols at the operating system level. They are embedded with

the client and server applications.

The purpose of the application protocols is to allow communication between programs

developed by different vendors. Application protocols related to IP networks are

standardized in the form of RFCs by the IETF. Many protocols used on the Internet handle

only text messages. In these protocols, the separation between fields is often made by a

newline character (\n) . For example, a HTTP message from a client requesting the

“http://espec.ppgia.pucpr.br/~jamhour/welcome.html” page may have the following

format:

GET /~jamhour/welcome.html HTTP/1.1\r\n

Host: espec.ppgia.pucpr.br\r\n

Cache-Control: no-cache\r\n

\r\n

Protocols in text format, such as HTTP, the transmission content that include non-printable

characters (such as pictures or videos) must be encoded using algorithms such as base64.

These algorithms are able to encode any binary information into text characters, so it can

be transmitted without breaking the protocol.

Page 17: The aim of this unit is to review the main concepts ...jamhour/Download/pub... · The first difference between TCP and UDP refers to the presence or absence of connection. A connection

17

As discussed earlier, IANA (Internet Assigned Number Authority) defines a standard port

number assignment to TCP or UDP applications. This standard ports is called "Well

Known Ports". The figure illustrates the port numbers associated with some well known

application protocols. Well-known ports are mostly used at the server side.

On the client side, the port number is dynamic, and it is chosen by the operating system

when the client application requests a connection to the server. There are some peer-to-peer

protocols, such as SMB (Sever Message Block), where the client-server paradigm does not

stand, and the fixed port number is used by all peers. For example, wget is a command line

http client used in Linux. When you type the following command, the wget client will

receive a dynamic port that will be associated to it during the transfer of the “vlan.tar.gz”

wget http://espec.ppgia.pucpr.br/~jamhour/vlan.tar.gz

The port number assigned to the wget is released after the end of the file download,

because the TCP connection is terminated by the http server. When you type the wget

command is not necessary to specify which port the http server is listening. This happens

because the client wget by default assumes that the server is connected to the port number

80. However, it is possible to make server applications to listen to alternate ports numbers.

If the http server “espec” is listening on port number 8080 (not default), you must inform

the port number of the server to the wget client as follows:

wget http://espec.ppgia.pucpr.br:8080/~jamhour/vlan.tar.gz

Most server applications in Linux have a name ending with "d". This happens because

applications that run in the background (with no visible user interface) are called daemon

in Linux.

Page 18: The aim of this unit is to review the main concepts ...jamhour/Download/pub... · The first difference between TCP and UDP refers to the presence or absence of connection. A connection

18

In this unit, we reviewed the main concepts related to transport and application layers in

the TCP/IP architecture. Deeper knowledge about the operation of the TCP and UDP are

needed, for example, in network security. Many “attacks” performed against applications

such as “stealing a TCP connection” is performed at the transport level.

Also, information about the TCP flags are used in firewall rules to prevent against “port

spoofing” attacks. The knowledge about how port numbers are assigned is also important

to understand the operation of the Proxy and NAT mechanisms, discussed in the sequence

of this discipline. The concept of “application protocol” is also necessary to understand the

operation of Proxies.