Www.ischool.drexel.edu INFO 330 Computer Networking Technology I Chapter 3 The Transport Layer Dr....

www.ischool.drexel.edu

INFO 330Computer Networking

Technology I Chapter 3

The Transport Layer

Dr. Jennifer Booker

1INFO 330 Chapter 3


Transport Layer

• The Transport Layer handles logical communication between processes– It’s the last layer not used between processes

for routing, so it’s the last thing a client process and the first thing a server process sees of a packet

– By logical communication, we recognize that the means used to get between processes, and the distance covered, are irrelevant

2INFO 330 Chapter 3


Transport vs Network

• Notice we didn’t say ‘hosts’ in the previous slide…that’s because– The network layer provides logical communication

between hosts

• Mail analogy– Let’s assume cousins (processes) want to send

letters to each other between their houses (hosts)– They use their parents (transport layer) to mail the

letters, and sort the mail when it arrives

3INFO 330 Chapter 3



– The letters travel through the postal system (network layer) to get from house to house

• The transport layer doesn’t participate in the network layer activities (e.g. most parents don’t work in the mail distribution centers)– The transport layer protocols are localized in

the hosts– Routing isn’t affected by anything the

transport layer added to the messages

4INFO 330 Chapter 3



• Following the analogy, different people might have to pick up and sort the mail; they’re like using different transport layer protocols

• And the transport layer protocols (parents) are often at the mercy of what services the network layer (postal system) provides– Some services can be provided at the transport layer,

even if the network layer doesn’t (e.g. reliable data transfer or encryption)

5INFO 330 Chapter 3


Two Choices

• Here we choose between TCP and UDP– In the transport layer, a packet is a segment– In the network layer, a packet is a datagram

• The network layer is home to the Internet Protocol (IP)– IP provides logical communication between hosts– IP makes a “best effort” to get segments where they

belong – no guarantees of delivery, or delivery sequence, or delivery integrity

6INFO 330 Chapter 3


IP

• Each host has an IP address• Common purpose of UDP and TCP is extend

delivery of IP data to the host’s processes – This is called transport-layer multiplexing and

demultiplexing– Both UDP and TCP also provide error checking

• That’s it for UDP – data delivery and error checking!

7INFO 330 Chapter 3


TCP

• TCP also provides reliable data transfer (not just data delivery)– Uses flow control, sequence numbers,

acknowledgements, and timers to ensure data is delivered correctly and in order

• TCP also provides congestion control– TCP applications share the available

bandwidth (they watched Sesame Street!)

– UDP takes whatever it can get (greedy little protocol)

8INFO 330 Chapter 3


Multiplexing & Demultiplexing

• At the destination host, the transport layer gets segments from the network layer

• Needs to deliver these segments to the correct process on that host– Do so via sockets, which connect processes

to the network– Each socket has a unique identifier, whose

format varies for UDP and TCP

9INFO 330 Chapter 3



• Demultiplexing is getting the transport layer segment into the correct socket

• Hence Multiplexing is taking data from various sockets, applying header info, breaking it into segments, and delivering it to the network layer

• Multiplexing and demultiplexing are used in any kind of network; not just in the Internet protocols

10INFO 330 Chapter 3



application

transport

network

link

physical

P1 application

transport

network

link

physical

application

transport

network

link

physical

P2P3 P4P1

host 1 host 2 host 3

= process= socket

delivering received segmentsto correct socket

Demultiplexing at rcv host:gathering data from multiplesockets, enveloping data with header (later used for demultiplexing)

Multiplexing at send host:



Mail Analogy

• Multiplexing is when a parent collects letters from the cousins, and puts them into the mail

• Demultiplexing is getting the mail, and handing the correct mail to each cousin

• Here we need unique socket identifiers, and some place in the header for the socket identifier information



Segment Header

• Hence the segment header starts with the source and destination port numbers

• Each port number is a 16-bit (2 byte) value

(0 to 65,535)– Well known port numbers are from 0 to 1023

(210 -1)

• After the port numbers are other headers, specific to TCP or UDP, then the message



UDP Multiplexing

• UDP assigns a port number from 1024 to 65,535 to each socket, unless the developer specifies otherwise– UDP identifies a socket only by destination IP

address and destination port number

• The port numbers for source and destination are switched (inverted) when a reply is sent– So a segment from port 19157 to port 46428

generates a reply from port 46428 to 19157 14INFO 330 Chapter 3


TCP Multiplexing

• TCP is messier, of course• TCP identifies a socket by four values:

– Source IP address, source port number, destination IP address, and destination port number

• Hence if UDP gets two segments with the same destination IP and port number, they’ll both go to the same process– TCP tells the segments apart via source IP/port



TCP Multiplexing

• So if you have two HTTP sessions going to the same web server and page, how can TCP tell them apart?– Even though the destination IP and port (80)

are the same, and the two sessions (processes) have the same source IP address, they have different source port numbers



Port scanning

• Apps called port scanners (e.g. nmap) can scan the ports on a computer and see which are open– This tell us what apps are running on that host– Then target attacks on those apps

• A big security vulnerability is to leave ports open you aren’t using– Could accept hostile TCP connections



Web Servers & TCP

• Each new client connection often uses a new process and socket to send HTTP requests and get responses– But a thread (lightweight process) can be

used, so a process can have multiple sockets for each thread

Host

P1

S1S2

S3

Each connection is a new thread off one process

Host

P1

S1

P2

S2

P3

S3

Each connection is a new process

OR



UDP

• The most minimal transport layer has to do multiplexing and demultiplexing

• UDP does this and a little error checking and, well, um, that’s about it!– UDP was defined in RFC 768– An app that uses UDP almost talks directly to IP– Adds only two small data fields to the header, after

the requisite source/destination addresses– There’s no handshaking; UDP is connectionless



UDP for DNS

• DNS uses UDP• A DNS query is packaged into a segment,

and is passed to the network layer– The DNS app waits for a response; if it

doesn’t get one soon enough (times out), it tries another server or reports no reply

• Hence the app must allow for the unreliability of UDP, by planning what to do if no response comes back



UDP Advantages

• Still UDP is good when:– You want the app to have detailed control over what is

sent across the network; UDP changes it little– No connection establishment delay– No connection state data in the end hosts; hence a

server can support more UDP clients than TCP– Small packet header overhead per segment

• TCP uses 20 bytes of header data, UDP only 8 bytes



UDP Apps

• Other than DNS, UDP is also used for– Network management (SNMP)– Routing (RIP)– Multimedia & telephony (proprietary protocols)– Remote file server (NFS)

• The lack of congestion control in UDP can be a problem when lost of large UDP messages are being sent – can crowd out TCP apps



UDP Header

• The UDP header has four two-byte fields in two lines (8 B total), namely:– Source port number; Destination port number– Length; Checksum

• Length is the total length of the segment, including headers, in bytes

• The checksum is used by the receiving app to see if errors occurred



Checksum

• Noise in the transmission lines can lose bits of data or rearrange them in transit

• Checksums are a common method to detect errors (RFC 1071)

• To create a checksum:– Find the sum of the binary digits of the message– The checksum is the 1s (ones) complement of

the sum– If message is uncorrupted, sum of message plus

checksum is all ones 1111111111111…



1s Complement?

• The 1s complement is a mirror image of a binary number – change all the zeros to ones, and ones to zeros– So the 1s complement of 00101110101 is

11010001010

• UDP does error checking because not all lower layer protocols do error checking– This provides end-to-end error checking,

since it’s more efficient than every step along the way



UDP

• That’s it for UDP!

• The port addresses, the message length, and a checksum to see if it got there intact

• Now see what happens when we want reliable data transfer



Reliable Data Transfer

• Distinguish between the service model, and how it’s really implemented– Service model: From the app perspective, it

just wants a reliable transport layer to connect sending and receiving processes

– Service implementation: In reality, the transport layer has to use an unreliable network layer (IP), so transport has to make up for the unreliability below it




• The sending process will give the transport layer a message rdt_send (rdt = reliable data transfer)– The transport protocol will convert to

udt_send (udt = unreliable data transfer; Fig 3.8 has typo) and give to the network layer

• At the receiving end, the protocol gets rdt_rcv from the network layer, – The protocol will convert to deliver_data and

give it to the receiving application process




App sees this “service model” But our transport protocol has to do this

netw

or

k la

yer




• Here we’ll refer to the data as packets, rather than distinguish segments, etc.

• Also consider that we’ll pretend we only have to send data one direction (unidirectional data transfer)– Bidirectional data transfer is what really occurs, but

the sending and receiving sides get switched

• Time to build a reliable data transfer protocol, one piece at a time



Reliable Data Transfer v1.0

• For the simplest case, called rdt1.0, assume the network is completely reliable

• Finite state machines (FSMs) for the sender and receiver each have one state – waiting for a call– The sending side (rdt_send) makes a packet

(make_pkt) and sends it (udt_send)– The receiving side (rdt_rcv) extracts data from

the packet (extract), and delivers it to the receiving app (deliver_data)




Wait for call from above packet = make_pkt(data)

udt_send(packet)

rdt_send(data)

extract (packet,data)deliver_data(data)

Wait for call from

below

rdt_rcv(packet)

sender receiver

Wait for call from above packet = make_pkt(data)

udt_send(packet)

rdt_send(data)

extract (packet,data)deliver_data(data)

Wait for call from

below

rdt_rcv(packet)

sender receiver

• Here a packet is the only unit of data

• No feedback to sender is needed to confirm receipt of data, and no control over transmission rate is needed




• Now allow bit errors in transmission– But all packets are received, in the correct

order

• Need acknowledgements to know when a packet was correct (OK, 10-4) versus when it wasn’t (please repeat); called positive and negative acknowledgements, respectively– These types of messages are typical for any

Automatic Repeat reQuest (ARQ) protocol33INFO 330 Chapter 3



• So allowing for bit errors requires three capabilities– Error detection to know if a bit error occurred– Receiver feedback, both positive (ACK) and

negative (NAK) acknowledgements– Retransmission of incorrect packets




Wait for call from above

snkpkt = make_pkt(data, checksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) && isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) &&isNAK(rcvpkt)

Wait for ACK or

NAK

sender

rdt_send(data)

Wait for call from above

snkpkt = make_pkt(data, checksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) && isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) &&isNAK(rcvpkt)

Wait for ACK or

NAK

Wait for ACK or

NAK

sender

rdt_send(data)

extract(rcvpkt,data)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) && corrupt(rcvpkt)

Wait for call from

below

receiver

extract(rcvpkt,data)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)

udt_send(NAK)


udt_send(NAK)


Wait for call from

below

Wait for call from

below

receiver




• Sending FSM (cont.)– The left state waits for a packet from the sending app,

makes a packet with a checksum (make_pkt) – Then the left state sends the packet (udt_send)– It moves to the other state (waiting for ACK/NAK)

• If it gets a NAK response (errors detected), then it resends the packet (udt_send) until it gets it right

• If it gets an ACK response (no errors), then it goes back to the other state to wait for the next packet from the app




• Notice this model does nothing until it gets the NAK/ACK, so it’s a stop-and-wait protocol

• Receiving FSM– The receiving side uses the checksum to see

if the packet was corrupted• If it was (&& corrupt) send a NAK response• If it wasn’t (&& notcorrupt), extract and deliver

the data, and send an ACK response

• But what if the NAK/ACK is corrupted?37INFO 330 Chapter 3



• Three possible ways to handle NAK/ACK errors– Add another type of response to have the

NAK/ACK repeated; but what if that response got corrupted? Leads to long string of messages…

– Add checksum data to the NAK/ACK, and data to recover from the error

– Resend the packet if the NAK/ACK is garbled;

but introduces possible duplicate packets38INFO 330 Chapter 3



• TCP and most reliable protocols add a sequence number to the data from the sender– Since we can’t lose packets yet, a one-bit

number is adequate to tell if this is a new packet or a repeat of the previous one

• This gives our new model rdt version 2.1




Wait for call 0 from

above

sndpkt = make_pkt(0, data, checksum)udt_send(sndpkt)

rdt_send(data)

Wait for ACK or NAK 0 udt_send(sndpkt)

rdt_rcv(rcvpkt) && ( corrupt(rcvpkt) ||isNAK(rcvpkt) )


rdt_send(data)

rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && isACK(rcvpkt)

udt_send(sndpkt)



Wait forcall 1 from

above

Wait for ACK or NAK 1


above


rdt_send(data)


Wait for ACK or NAK 0 udt_send(sndpkt)



rdt_send(data)


udt_send(sndpkt)



Wait forcall 1 from

above

Wait forcall 1 from

above



sender




• Now the number of states are doubled, since we have sequence numbers 0 or 1– So in make_pkt(1, data, checksum)

the 1 is the sequence number• Sequence number alternates 010101 if everything

works; if a packet is corrupted, the same sequence number is expected two or more times

• Start at ‘Wait for call 0’ state; when get packet, send it to network with sequence 0– Then wait for ACK or NAK with sequence 0




– If the packet was corrupt, or got a NAK, resend that packet (upper right loop)

• Otherwise wait for call with sequence 1 from app

– When call 1 is received, make and send the packet with sequence 1 (desired outcome)

• Then wait for a NAK/ACK with sequence 1

– If corrupt or got a NAK, resend (lower left loop)• Otherwise go to waiting for a sequence 0 call from

the app

– Repeat cycle




Wait for 0 from below

sndpkt = make_pkt(NAK, chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) && not corrupt(rcvpkt) &&has_seq0(rcvpkt)

rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && has_seq1(rcvpkt)

extract(rcvpkt,data)deliver_data(data)sndpkt = make_pkt(ACK, chksum)udt_send(sndpkt)




rdt_rcv(rcvpkt) && (corrupt(rcvpkt)

sndpkt = make_pkt(ACK, chksum)udt_send(sndpkt)





















receiver




• The receiver side doubles in # of states• When waiting for seq 0 state

– If the packet has sequence 0 and isn’t corrupt, extract and deliver the data, and send an ACK; go to wait for seq 1 state

– If the packet was corrupt, reply with a NAK– If the packet has sequence 1 and was not

corrupt (it’s out of order) send an ACK and keep waiting for a seq 0 packet

• Mirror the above for starting from ‘wait for seq 1’ state




• Could achieve the same effect without a NAK (for corrupt packet) if we only ACK the last correctly received packet

• Two ACKs for the same packet (duplicate ACKs) means the packet after the second ACK wasn’t received correctly

• The NAK-free protocol is called rdt2.2





above


rdt_send(data)

udt_send(sndpkt)

rdt_rcv(rcvpkt) && ( corrupt(rcvpkt) ||

isACK(rcvpkt,1) )

rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && isACK(rcvpkt,0)

Wait for ACK

0

sender FSMf ragment



extract(rcvpkt,data)deliver_data(data)sndpkt = make_pkt(ACK1, chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) && (corrupt(rcvpkt) ||

has_seq1(rcvpkt))

udt_send(sndpkt)

receiver FSMf ragment


above


above


rdt_send(data)

udt_send(sndpkt)

rdt_rcv(rcvpkt) && ( corrupt(rcvpkt) ||

isACK(rcvpkt,1) )


Wait for ACK

0

Wait for ACK

0

sender FSMf ragment




extract(rcvpkt,data)deliver_data(data)sndpkt = make_pkt(ACK1, chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) && (corrupt(rcvpkt) ||

has_seq1(rcvpkt))

udt_send(sndpkt)

receiver FSMf ragment




• Again, the send and receive FSMs are symmetric for sequence 0 and 1– Sender must now check the sequence

number of the packet being ACK’d (see isACK message)

– The receiver must include the sequence number in the make_pkt message

• FSM on page 211 also has oncethru variable to help avoid duplicate ACKs




• Now account for the possibility of lost packets• Need to detect packet loss, and decide what to

do about it– The latter is easy with the tools we have (ACK,

checksum, sequence #, and retransmission), but need a new detection mechanism

• Many possible loss detection approaches – Focus on making the sender responsible for it




• Sender thinks a packet lost when packet doesn’t get to receiver, or the ACK gets lost

• Can’t wait for worst case transmission time, so pick a reasonable time before error recovery is started– Could result in duplicate packets if it was still

on the way; but rdt2.2 can handle that

• For the sender, retransmission is ultimate solution – whether packet or ACK was lost




• Knowing when to retransmit needs a countdown timer– Count time from sending a packet to still not

getting an ACK• If time is exceeded, retransmit that packet• Works the same if packet is lost or ACK is lost

• Since packet sequence numbers alternate 0-1-0-1-etc., is called an alternate-bit protocol



Reliable Data Transfer v3.0sndpkt = make_pkt(0, data, checksum)udt_send(sndpkt)start_timer

rdt_send(data)

Wait for

ACK0

rdt_rcv(rcvpkt) && ( corrupt(rcvpkt) ||isACK(rcvpkt,1) )


above

sndpkt = make_pkt(1, data, checksum)udt_send(sndpkt)start_timer

rdt_send(data)




stop_timerstop_timer

udt_send(sndpkt)start_timer

timeout


timeout

rdt_rcv(rcvpkt)

Wait for call 0from

above

Wait for

ACK1

rdt_rcv(rcvpkt)


rdt_send(data)

Wait for

ACK0

Wait for

ACK0



above


above


rdt_send(data)




stop_timerstop_timer


timeout


timeout

rdt_rcv(rcvpkt)

Wait for call 0from

above

Wait for call 0from

above

Wait for

ACK1

Wait for

ACK1

rdt_rcv(rcvpkt)

sender




• How does the receiver FSM differ from rdt2.2? It doesn’t. – The sender is responsible for loss detection

• Notice that, even allowing for lost packets, we still assume only once packet is sent completely and correctly at a time

• But rdt3.0 still stops to wait for timeout of each packet – fix with pipelining



Pipelined RDT

• Suppose we implemented rdt3.0 between NYC and LA– Distance of 3000 miles gives RTT of about 30 ms– If transmission rate is 1 Gbps, and packets are 1 kB

(8 kb)• Transmission time is therefore only 8 kb / 1E9 b/s =

8 microseconds (s)

– Even if ACK messages are very small (transmission time about zero), the time for one packet to be sent and ACK is 30.008 ms



Pipelined RDT

• Hence we’re transmitting 0.008 ms out of the 30.008 ms RTT, which equals 0.03% utilization– How a protocol is implemented drastically

affects its usefulness!

• It makes sense to send multiple packets and keep track of the ACKs for each– Methods to do so are Go-Back-N (GBN) and

Selective Repeat (SR)54INFO 330 Chapter 3


Go-Back-N

• In this protocol, sender can send up to N packets without getting an ACK*

• N is also called a window size, and the protocol is a.k.a. a sliding-window protocol– Let base be the number of the first packet in

a window– The window size, N, is already defined– Then all packets from 0 to base-1 have

already been sent* Why a limit at all? Need for flow and congestion control later.



Go-Back-N

– The window currently focuses on packets number base to base+N, these packets can be sent before their ACK is received

• Packet sequence numbers need to have a maximum value; if ‘k’ bits are in the sequence number, the range of sequence numbers is 0 to 2k-1– The sequence numbers are used in a circle,

so after 2k-1 you use 0 again, then 1, etc.



Go-Back-N

– rdt3.0 only had sequence numbers 0 and 1– TCP has a 32-bit sequence number range for

the bytes in a byte stream

• In the FSMs for Go-Back-N (GBN)– Sender must respond to:

• Call from above (i.e. the app)• Receipt of an ACK from any of the packets

outstanding, providing cumulative acknowledgement

• Timeout – causes all un-ACKed packets re-sent



Go-Back-N

• The GBN receiver does:– If a packet is correct and in order, send an

ACK• Sender moves window up with each correct and in

order packet ACKed – this minimizes resending later

– In all other cases, throw away the packet, and resend ACK for the most recent correct packet

• Hence we throw away correct but out-of-order packets – this makes receiver buffering easier



Go-Back-N

• GBN can be implemented in event-based programming; events here are– App invokes rdt_send– Receiver protocol receives rdt_rcv– Timer interrupts

• In contrast, consider the selective repeat (SR) approach for pipelining



Selective Repeat

• Large window size and bandwidth delay can make a lot of packets in the pipeline under GBN, which can cause a lot of retransmission when a packet is lost

• Selective repeat only retransmits packets believed to be in error – so retransmission is on a more individual basis

• To do this, buffer out-of-order packets until the missing packets are filled in



Selective Repeat

• SR still uses a window of size N packets• SR sender responds to:

– Data from the app above it; finds next sequence number available, and sends as soon as possible

– Timeout is kept for each packet– ACK received from the receiver; then sender

marks off that packet, and moves the window forward; can transmit packets inside the new window



Selective Repeat

• The SR receiver responds to– Packet within the current window; then send

an ACK; deliver packets at the bottom of the window, but buffer higher number packets (out of order)

– Packets that were previously ACKed are ACKed again

– Otherwise ignore the packet

• Notice the sender and receiver windows are generally not the same!!



Selective Repeat

• It’s possible that the sequence number range and window size could be too close, producing confusing signals– To prevent this, need

window size < half of sequence number range



Packet Reordering

• Our last assumption was that packets arrive in order, if at all– What is they arrive out of order?

• Out of order packets could have sequence numbers outside of either window (snd or rcv)

• Handle by not allowing packets older than some max time– TCP typically uses 3 minutes



Reliable Data Transfer Mechanisms

– Checksum, to detect bit errors in a packet– Timer, to know when a packet or its ACK was lost– Sequence number, to detect lost or duplicate

packets– Acknowledgement, to know packet got to receiver

correctly– Negative acknowledgement, to tell packet was

corrupted but received– Window, to pipeline many packets at once before an

ACK was received for any of them



TCP Intro

• Now see how all this applies to TCP– First in RFC 793, now RFC 2581– Invented circa 1974 by Vint Cerf and Robert Kahn

• TCP starts with a handshake protocol, which defines many connection variables– Connection only at hosts, not in between– Routers are oblivious to whether TCP is used!

• TCP is a full duplex service – data can flow both directions at once, and is connection-oriented



TCP Intro

• TCP is point-to-point – between a single sender and a single receiver– In contrast with multipoint technologies

• TCP is client/server based• Client needs to establish a socket to the

server’s hostname and port– Recall default port numbers are app-specific– Special segments are sent by client, server,

and client to make the three-way handshake



TCP Intro

• Once connection exists, processes can send data back and forth

• Sending process sends data through socket to the TCP send buffer– TCP sends data from the send buffer when it feels

like it– Max Segment Size (MSS) is based on the max frame

size, or Max Transmission Unit (MTU)– Want 1 TCP segment to eventually fit in the MTU



TCP Intro

– Typical MTU values are 512 – 1460 bytes

• MSS is the max app data that can fit in a segment, not the total segment size (which includes headers)

• TCP adds headers to the data, creating TCP segments– Segments are passed to the network layer to

become IP datagrams, and so on into the network



TCP Intro

• At the server side, the segment is placed in the receive buffer

• So a TCP connection consists of two buffers (send and receive), some variables, and two socket connections (send and receive) on the corresponding processes



TCP Segment Structure

• A TCP segment consists of header fields and a data field– The data field size is limited by the MSS

• Typical header size is 20 bytes– The header is 32 bits wide (4 bytes), so it has

five lines at a minimum



TCP Header Structure

• The header lines are– Source and destination port numbers (16 bit ea.)– Sequence number (32 bit)– ACK number (32 bit)– A bunch of little stuff (header length, URG, ACK, PSH,

RST, SYN, and FIN bits), then the receive window (16 bit)

– Internet checksum, urgent data pointer (16 bit ea.)– And possibly several options




• We’ve seen the port numbers (16 bits each), sequence and ACK numbers (32 bits each)

• The ‘bunch of little stuff’ includes – Header length (4 bits)– A flag field includes six one-bit fields: ACK, RST, SYN,

FIN, PSH, and URG• The URG bit marks urgent data later on that line

• The receive window is used for flow control




• The checksum is used for bit error detection, as with UDP– The urgent data pointer tells where the urgent

data is located

• The options include negotiating the MSS, scaling the window size, or time stamping



TCP Sequence Numbers

• The sequence numbers are important for TCP’s reliability

• TCP views data as unstructured but ordered stream of bytes

• Hence sequence numbers for a segment is the byte-stream number of the first byte in the segment– Yes, each byte is counted!



TCP Sequence Numbers

• So if the MSS is 1000 bytes, the first segment will be number 0, and cover bytes 0 to 999– The second segment is number 1000, and

covers bytes 1000-1999– Third is number 2000, and covers 2000-2999,

etc.

• Typically start sequences at random numbers on both sides, to avoid accidental overlap with previously used numbers



TCP Acknowledgement No.

• TCP acknowledgement numbers are weird• The number used is the next byte number

expected from the sender– So if host B sends to A (!) bytes 0-535 of data,

host A expects byte 536 to be the start of the next segment, so 536 is the Ack number

• This is a cumulative acknowledgement, since it only goes up to the first missing byte in the byte-stream



TCP Out-of-Order Segments

• What does it do when segments arrive out of order? – That’s up to the TCP implementer

• TCP can either discard out of order segments, or keep the strays in buffer and wait for the pieces to get filled in– The former is easier to implement, the latter

is more efficient and commonly used



Telnet Example

• Telnet (RFC 854) is an old app for remote login via TCP

• Telnet interactively echoes whatever was typed to show it got to the other side

• Host A is the client, starts a session with Host B, the server– Suppose client starts with sequence number

42, and server with 79



Telnet ExampleHost A Host B

Seq=42, ACK=79, data = ‘C’


Seq=43, ACK=80

Usertypes

‘C’

host ACKsreceipt

of echoed‘C’

host ACKsreceipt of‘C’, echoes

back ‘C’

timesimple telnet scenario

Host A Host B



Seq=43, ACK=80

Usertypes

‘C’

host ACKsreceipt

of echoed‘C’

host ACKsreceipt of‘C’, echoes

back ‘C’

timetimesimple telnet scenario

• User types a single letter, ‘c’

• Notice how the seq and Ack numbers mirror or “piggy back” each other



Timeout Calculation

• TCP needs a timeout interval, as discussed in the rdt example, but how long?– Longer than RTT, but how much? A week?

• Measure sample RTT for segments here and there (not every one)– This SampleRTT value will fluctuate, with an

average value called EstimatedRTT which is a moving average updated with each measurement



Timeout Calculation

– Naturally, EstimatedRTT is a smoother curve than each SampleRTT

• EstimatedRTT =0.875*EstimatedRTT + 0.125*SampleRTT

• The variability of RTT is measured by DevRTT, which is the moving average magnitude difference between SampleRTT and EstimatedRTT– Let DevRTT = 0.75*DevRTT + 0.25*

|SampleRTT - EstimatedRTT|



Timeout Calculation

• We want the timeout interval larger than EstimatedRTT, but not huge; use– TimeoutInterval = EstimatedRTT + 4*DevRTT

• This is analogous to control charts, where the expected value of a measurement is no more than the (mean + 3*the standard deviation) about ¼% of the time– DevRTT isn’t a standard deviation, but the

idea is similar83INFO 330 Chapter 3


Timeout Calculation

• Notice this means that the timeout interval is constantly being calculated, and to do so requires frequent measurement of SampleRTT to find current values for:– Estimated RTT– DevRTT– TimeoutInterval




• IP is not a reliable datagram service– It doesn’t guarantee delivery, or in order, or

intact delivery

• In theory we saw that separate timers for each segment would be nice; in reality TCP uses one retransmission timer for several segments (RFC 2988)

• For the next example, assume Host A is sending a big file to Host B



Simplified TCP

• Here the sender responds to three events:– Receive data from application

• Then it makes segments of the data, each with a sequence number, and passes them to the IP layer

• Starts timer

– Timer times out• Then it re-sends the segment that timed out

– ACK was received• Compares the received ACK value with SendBase, the last

byte number successfully received• Restart timer if any un-ACK segments left



Simplified TCP

• Even this version of TCP can successfully handle lost ACKs by ignoring duplicate segments (Fig 3.34, p. 256)

• If a segment times out, later segments don’t get re-sent (Fig 3.35, p. 257)

• A lost ACK can still be deduced to not be a lost segment (Fig 3.36, p. 258)



Doubling Timeout

• After a timeout event, many TCP implementations double the timeout interval

• This helps with congestion control, since timeout is often due to congestion, and retransmitting often just makes it worse!



Fast Retransmit

• Waiting for the timeout can be too slow• Might know to retransmit sooner if get

duplicate ACKs– An ACK for a given byte number means a gap

was noted in the segment sequence (since there are no negative NAKs)

• Getting three duplicate ACKs typically forces a fast retransmit of the segment after that value



Go-Back-N vs. Selective Repeat?

• TCP partly looks like Go-Back-N (GBN)– Tracks last sequence number transmitted but not

ACKed (SendBase) and sequence number of next byte to send (NextSeqNum)

• TCP partly looks like Selective Repeat (SR)– Often buffers out-of-order segments to limit the range

of segments retransmitted– TCP can use selective acknowledgment (RFC 2018)

to specify which segments are out of order



Flow Control

• TCP connection hosts maintain a receive buffer, for bytes received correctly and in order– Apps might not read from the buffer for a

while, so it can overflow

• Flow control focuses on preventing overflow of the receive buffer– So it also depends on how fast the receiving

app is reading the data!91INFO 330 Chapter 3


Flow Control

• Hence the sender in TCP maintains a receive window (RcvWindow) variable – how much room is left in the receive buffer– The receive buffer has size RcvBuffer– The last byte number read by the receiving

app is LastByteRead– The last byte put in the receive buffer is

LastByteRcvd– RcvWindow = RcvBuffer – (LastByteRcvd –

LastByteRead) = rwnd



Flow Control

• So the amount of room in RcvWindow varies with time, and is returned to the sender in the receive window field of every segment (see slide 73)– The sender also keeps track of LastByteSent and

LastByteAcked; the difference between them is the amount of data between sender and receiver

• Keep that difference less than the RcvWindow to make sure the receive buffer isn’t overflowed

• LastByteSent – LastByteAcked <= RcvWindow



Flow Control

• If the RcvWindow goes to zero, the sender can’t send more data to the receiver ever!

• To prevent this, TCP makes the sender transmit one byte messages when RcvWindow is zero, so that the receiver can indicate when the buffer is not full



UDP Flow Control

• There ain’t none (sic!)

• UDP adds newly arrived segments to a buffer in front of the receiving socket– If the buffer gets full, segments are dropped– Bye-bye data!



TCP Connection Management

• Now look at the TCP handshake in detail– Important since many security threats exploit it

• Recall the client process wants to establish a connection with a server process– Step 1 – client sends segment with code SYN=1 and

an initial sequence number (client_isn) to the server

• Choosing a random client_isn is key for security




– Step 2 – Server allocates variables needed for the connection, and sends a connection-granted segment, SYNACK, to the client

• This SYNACK segment has SYN=1, the ack field is set to client_isn+1, and the server chooses its initial sequence number (server_isn)

– Step 3 – Client gets SYNACK segment, and allocates its buffers and variables

• Client sends segment with ack value server_isn+1, and SYN=0




• The SYN bit stays 0 while the connection is open– Why is a three-way handshake used? – Why isn’t two-way enough?

• Now look at closing the connection– Either client or server can close the

connection




• One host, let’s say the client, sends a segment with the FIN bit set to 1

• The server acknowledges this with a return segment, then sends a separate shutdown segment (also with FIN=1)

• Client acknowledges the shutdown from the server, and resources in both hosts are deallocated



TCP State Cycle

• Another way to view the history of a TCP connection is through its state changes (Fig 3.41, 3.42)– The connection starts Closed– After the handshake is completed it’s Established

• Then the processes communicate

– Sending or receiving a FIN=1 starts the closing process, until both sides get back to Closed

• Whoever sent a FIN waits some period (30-120 s) after ACKing the other host’s FIN before closing their connection



Stray Segments

• Receiving a segment with SYN trying to open an unknown or closed port results in:– Server sends a reset message; RST=1,

meaning “go away, that port isn’t open”

• Similarly, a UDP packet with unknown socket results in sending a special ICMP datagram (see next chapter)



Stray Segments

• So mapping ports on a system could yield three responses– Get a TCP SYNACK, implying the port is open

and some app is using it– Get a TCP RST segment, meaning the port is

closed– No response, implying the port could be

blocked by a firewall



SYN Flood Attacks

• The TCP handshake is the basis for an attack called the SYN flood– Have one or more computers sent lots of SYN

messages to a server – but spoof the return IP address so the connection is never finished

– Makes the server waste resources waiting for you; can crash it if done fast enough

– A new defense against this is the SYN cookie



SYN cookie

• When a SYN segment is received, the server creates a sequence number that is a hash function of the source and destination IP addresses and port numbers– It sets up nothing else!– When it receives the ACK response, it uses

the cookie to recover the original info



Congestion Control

• Now address congestion control issues– Congestion is a traffic jam in the middle of the

network somewhere– Most common cause is too many sources

sending data too fast into the network



Congestion Control

• Key lessons from cases b and c are:– A congested network forces

retransmissions for packets lost due to buffer overflow, which adds to the congestion

– A congested network can waste its bandwidth by sending duplicate packets which weren’t lost in the first place



Congestion Control

• (skipping the big messy example)

• The lesson is: dropping a packet wastes the transmission capacity of every upstream link that packet saw

• So what are our approaches for dealing with congestion?



Congestion Control Approaches

• Either the network provides explicit support for congestion control, or it doesn’t– End-to-end congestion control is when the

network doesn’t provide explicit support • Presence of congestion is inferred from packet

loss, delays, etc.• Since TCP uses IP, this is our only option right now



Congestion Control Approaches

– Network-assisted congestion control is when network components (e.g. routers) provide congestion feedback explicitly

• IBM SNA, DECnet, and ATM use this, and proposals for improving TCP/IP have been made

• Network equipment may provide various levels of feedback

– Send a choke packet to tell sender they’re full– Flag existing packets to indicate congestion– Tell what transmission rate the router can support

at the moment



ATM ABR Congestion Control

• ATM Available Bit-Rate (ABR) is one method of network-assisted congestion control– It uses a combination of virtual circuits (VC) and

resource management (RM) cells (packets) to convey congestion information along the VC

– Data cells (packets) contain a congestion bit to prompt sending a RM cell back to the sender

– Other bits convey whether the congestion is mild (don’t increase traffic) or severe (back off) or tell the max rate supported along the circuit



TCP Congestion Control

• As noted, TCP uses end-to-end congestion control, since IP provides no congestion feedback to the end systems– In TCP, each sender limits its send rate based

on its perceived amount of congestion

• Each side of a TCP connection has a send buffer, receive buffer, and several variables

• Each side also has a congestion window variable, CongWin (or cwnd)




• The max send rate for a sender is the minimum of CongWin and the RcvWindow– LastByteSent – LastByteAcked <=

min(CongWin, RcvWindow)

• Assume for the moment that the RcvWindow is large, so we can focus on CongWin– If loss and transmission delay are small,

CongWin bytes of data can be sent every RTT, for a send rate of CongWin/RTT




• Now address how to detect congestion

• Call a “loss event” when a timeout occurs or three duplicate ACKs are received– Congestion causes loss events in the network

• If there’s no congestion, lots of happy ACKs tell TCP to increase CongWin quickly, and hence transmission rate– Conversely, slow ACK receipt slows CongWin

increase113INFO 330 Chapter 3



• TCP is self-clocking, since it measures its own feedback (ACK receipt) to determine changes in CongWin

• Now look at how TCP defines its congestion control algorithm in three parts– Additive-increase, multiplicative-decrease– Slow start– Reaction to timeout events



Additive-increase, Multiplicative-decrease

• When a loss event occurs, CongWin is halved unless it approaches 1.0 MSS, a process called multiplicative-decrease

• When there’s no perceived congestion, TCP increases CongWin slowly, adding 1 MSS each RTT – this is additive-increase

• Collectively they are the AIMD algorithm

Recall MSS = maximum segment size



AIMD Algorithm

• Over a long TCP connection, when there’s little congestion, AIMD will result in slow rises in CongWin, followed by a cut in half when a loss event occurs; repeated that produces a grumpy sawtooth wave

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow



Slow Start

• The initial send rate is typically 1 MSS/RTT, which is really slow

• To avoid a really long ramp up to a fast rate, an exponential increase in CongWin is used until the first loss event occurs– CongWin doubles every RTT during

slow start

• Then the AIMD algorithm takes over



Reaction to Timeout

• Timeouts are not handled the same as triple duplicate ACKs– Triple duplicate ACKs are followed by: halve

CongWin, then use AIMD approach– But true timeout events are handled differently

• The TCP sender returns to slow start, and if no problems occur, ramps up to half of the CongWin value before the timeout occurred– A variable Threshold stores the 0.5*CongWin value

when a loss event occurs



Reaction to Timeout

– Once CongWin gets back to the Threshold value, it is allowed to increase linearly per AIMD

• So after a triple duplicate ACK, CongWin recovers faster (called a fast recovery, oddly enough) than after a timeout– Why do this? Because the triple duplicate ACK

proves that several other packets got there successfully, even if one was lost

– A timeout is a more severe congestion indicator, hence the slower recovery of CongWin



TCP Tahoe & Reno

• TCP Tahoe follows the timeout recovery pattern after any loss event– Go back to CongWin = 1 MSS, ramp up

exponentially until reach Threshold, then follow AIMD

• TCP Reno introduced the fast recovery from triple duplicate ACK (use this)– After loss event, cut CongWin in half, and

resume linear increase until next loss event; repeat



TCP Tahoe & Reno

Assumes loss event from transmission round 8; shows how Tahoe and Reno respond differently.

New Threshold is 12/2=6*MSS



TCP Throughput

• Other variations exist, e.g. TCP Vegas

• If the sawtooth pattern continues, with a loss event occurring at the same congestion window size consistently, then the average throughput (rate) is– Average throughput = 0.75*W/RTT

where W is the CongWin size when the loss event occurs



TCP Future

• TCP will keep changing to meet the needs of the Internet

• Obviously, many critical Internet apps depend on TCP, so there are always changes being proposed– See RFC Index for current ideas

• For example, many want to support very high data rates (e.g. 10+ Gbps)



TCP Future

• In order to support that rate, the congestion window would have to be 83,333 segments– And not lose any of them!

• If we have the loss rate (L) and MSS, we can derive – Average throughput = 1.22*MSS/(RTT*sqrt(L))

• For 10 Gbps throughput, we need L about 2x10-10, or lose one segment in five billion!



Fairness

• If a router has multiple connections competing for bandwidth, is it fair in sharing?

• If two TCP connections of equal MSS and RTT are sharing a router, and both are primarily in AIMD mode, the throughput for each connection will tend to balance fairly, with cyclical changes in throughput due to changes in CongWin after packet drops



Fairness

• More realistically, unequal connections are less fair– Lower RTT gets more bandwidth (CongWin

increases faster)– UDP traffic can force out the more polite

TCP traffic– Multiple TCP connections from a single host

(e.g. from downloading many parts of a Web page at once) get more bandwidth



Are We Done Yet?

• So we’ve covered transport layer protocols from the terribly simple UDP to a seemingly exhaustive study of TCP– Key features along the way include

multiplexing/demultiplexing, error detection, acknowledgements, timers, retransmissions, sequence numbers, connection management, flow control, end-to-end congestion control

– So much for the “edge” of the Internet; next is the network layer, to start looking at the core


Www.ischool.drexel.edu INFO 330 Computer Networking Technology I Chapter 3 The Transport Layer Dr....

Documents

Transcript of Www.ischool.drexel.edu INFO 330 Computer Networking Technology I Chapter 3 The Transport Layer Dr....