Transport Layer ECE544: Communication Networks-II, Spring 2013

56
Tam Vu WINLAB, Dept. of Computer Science Rutgers University Transport Layer ECE544: Communication Networks-II, Spring 2013 hing materials from, L. Peterson, Sumathi Gopal and Sumit Rangwala, D. Raychaudhuri, Mi

description

Transport Layer ECE544: Communication Networks-II, Spring 2013. Tam Vu WINLAB, Dept. of Computer Science Rutgers University. Includes teaching materials from, L. Peterson, Sumathi Gopal and Sumit Rangwala, D. Raychaudhuri, Mike Freedman. IP Protocol Stack: Key Abstractions. Application. - PowerPoint PPT Presentation

Transcript of Transport Layer ECE544: Communication Networks-II, Spring 2013

Page 1: Transport Layer  ECE544: Communication Networks-II, Spring 2013

Tam VuWINLAB, Dept. of Computer

ScienceRutgers University

Transport Layer ECE544: Communication Networks-II, Spring

2013

Includes teaching materials from, L. Peterson, Sumathi Gopal and Sumit Rangwala, D. Raychaudhuri, Mike Freedman

Page 2: Transport Layer  ECE544: Communication Networks-II, Spring 2013

IP Protocol Stack: Key Abstractions

2

Problem: Network Layer (IP) provides only best-effort communication services

Best-effort local packet delivery

Best-effort global packet delivery

Reliable streams

Applications

Messages

Link

Network

Transport

Application

Page 3: Transport Layer  ECE544: Communication Networks-II, Spring 2013

Applications requirements vs. IP layer limitationsGuarantee message delivery

Network may drop messages.Deliver messages in the same order they are sent

Messages may be reordered in networks and incurs a long delay.

Delivers at most one copy of each messageMessages may duplicate in networks.

Support arbitrarily large messageNetwork may limit message size.

Support synchronization between sender and receiver

Allows the receiver to apply flow control to the sender

Support multiple application processes on each hostNetwork only support communication between hosts

Many more

Page 4: Transport Layer  ECE544: Communication Networks-II, Spring 2013

IP Protocol Stack: Key Abstractions

4

Transport layer:Provide applications with good abstractionsWithout support or feedback from the network Is the lowest layer in the network stack that is an end-to-

end protocol

Best-effort local packet delivery

Best-effort global packet delivery

Reliable streams

Applications

Messages

Link

Network

Transport

Application

Page 5: Transport Layer  ECE544: Communication Networks-II, Spring 2013

Transport Protocols

5

Logical communication between processesSender divides a message into segmentsReceiver reassembles segments into message

Transport services(De)multiplexing packetsDetecting corrupted dataOptionally: reliable delivery, flow control, …

Page 6: Transport Layer  ECE544: Communication Networks-II, Spring 2013

Two Basic Transport FeaturesDemultiplexing: port numbers

Error detection: checksums

Web server(port 80)

Client host

Server host 128.2.194.242

Echo server(port 7)

Service request for128.2.194.242:80(i.e., the Web server)

OSClient

IP payload

detect corruption6

Page 7: Transport Layer  ECE544: Communication Networks-II, Spring 2013

Most Popular Transport ProtocolsUser Datagram Protocol (UDP)

Support multiple applications processes on each host

Option to check messages for correctness with CRC check

Transmission Control Protocol (TCP) Ensures reliable delivery of packets between

source and destination processes Ensures in-order delivery of packets to

destination process Other services

Real Time Protocol (RTP) Serves real-time multimedia applicationsMoves decision making to the applications Runs over UDP

Page 8: Transport Layer  ECE544: Communication Networks-II, Spring 2013

User Datagram Protocol (UDP)

Service: Support for multiple processes on each host to communicate Issue: IP only provides communication between hosts (IP addresses)

SolutionAdd port number and associate a process with a port number4-Tuple Unique Connection Identifier: [SrcPort, SrcIPAddr, DestPort,

DestIPAddr ]

Lightweight communication between processesSend and receive messagesAvoid overhead of ordered, reliable delivery

No connection setup delay, in-kernel connection state

Used by popular appsQuery/response for DNSReal-time data in VoIP

SrcPort DesPort

Length Checksum

Payload

0 16 31

Page 9: Transport Layer  ECE544: Communication Networks-II, Spring 2013

User Datagram Protocol (UDP): Error Detection

Service: Ensure message correctnessIssue: Packet corruption in transit

SolutionUse Checksum. Includes UDP header, payload, pseudo headerPseudo header

Protocol number, source IP address, destination IP address, and UDP length

SrcPort DesPort

Length Checksum

Payload

0 16 31

Page 10: Transport Layer  ECE544: Communication Networks-II, Spring 2013

Transmitting a stream of bytes ?

Stream-of-bytes serviceSends and receives a

stream of bytes

Reliable, in-order deliveryCorruption: checksumsDetect loss/reordering:

sequence numbersReliable delivery:

acknowledgments and retransmissions

Connection orientedExplicit set-up and

tear-down of TCP connection

Flow controlPrevent overflow of

the receiver’s buffer space

Congestion controlAdapt to network

congestion for the greater good

11

Page 11: Transport Layer  ECE544: Communication Networks-II, Spring 2013

Transmission Control Protocol (TCP) First proposed by Vinton Cerf and Robert

Kahn, 1974TCP/IP enabled computers of all sizes, from

different vendors, different OSs, to communicate with each other.

Used by 80% of all traffic on the InternetReliable, in-order delivery, connection-

oriented, bye-stream service

Page 12: Transport Layer  ECE544: Communication Networks-II, Spring 2013

Starting and Ending a Connection:

TCP Handshakes

Page 13: Transport Layer  ECE544: Communication Networks-II, Spring 2013

Establishing a TCP Connection

Three-way handshake to establish connectionHost A sends a SYN (open) to the host BHost B returns a SYN acknowledgment (SYN ACK)Host A sends an ACK to acknowledge the SYN

ACK14

SYN

SYN

ACKACK

Data

A B

Data

Each host tells its Initial Sequence Number (ISN) to the other host.

Each host tells its Initial Sequence Number (ISN) to the other host.

Page 14: Transport Layer  ECE544: Communication Networks-II, Spring 2013

TCP Header

15

Source port Destination port

Sequence number

Acknowledgment

Advertised windowHdrLen

Flags0

Checksum Urgent pointer

Options (variable)

Data

Flags:SYNFINRSTPSHURGACK

Page 15: Transport Layer  ECE544: Communication Networks-II, Spring 2013

Step 1: A’s Initial SYN Packet

16

A’s port B’s port

A’s Initial Sequence Number

Acknowledgment

Advertised window20 Flags0

Checksum Urgent pointer

Options (variable)

Flags:SYNFINRSTPSHURGACK

A tells B it wants to open a connection…

Page 16: Transport Layer  ECE544: Communication Networks-II, Spring 2013

Step 2: B’s SYN-ACK Packet

B’s port A’s port

B’s Initial Sequence Number

A’s ISN plus 1

Advertised window20 Flags0

Checksum Urgent pointer

Options (variable)

Flags:SYNFINRSTPSHURGACK

B tells A it accepts, and is ready to hear the next byte…

… upon receiving this packet, A can start sending data

17

Page 17: Transport Layer  ECE544: Communication Networks-II, Spring 2013

Step 3: A’s ACK of the SYN-ACK

A’s port B’s port

B’s ISN plus 1

Advertised window20 Flags0

Checksum Urgent pointer

Options (variable)

Flags:SYNFINRSTPSHURGACK

A tells B it is okay to start sending

Sequence number

… upon receiving this packet, B can start sending data

18

Page 18: Transport Layer  ECE544: Communication Networks-II, Spring 2013

SYN Loss and Web Downloads

Upon sending SYN, sender sets a timerIf SYN lost, timer expires before SYN-ACK receivedSender retransmits SYN

How should the TCP sender set the timer?No idea how far away the receiver isSome TCPs use default of 3 or 6 seconds

Implications for web downloadUser gets impatient and hits reload … Users aborts connection, initiates new socketEssentially, forces a fast send of a new SYN!

19

Page 19: Transport Layer  ECE544: Communication Networks-II, Spring 2013

Tearing Down the Connection

Closing (each end of) the connectionFinish (FIN) to close and receive remaining bytesAnd other host sends a FIN ACK to acknowledgeReset (RST) to close and not receive remaining

bytes

SYN

SYN

AC

K

AC

KD

ata

FIN

AC

K

AC

K

timeA

BFIN

AC

K

20

Page 20: Transport Layer  ECE544: Communication Networks-II, Spring 2013

Sending/Receiving the FIN Packet

Sending a FIN: close()Process is done sending

data via socketProcess invokes

“close()”Once TCP has sent all

the outstanding bytes…… then TCP sends a FIN

Receiving a FIN: EOFProcess is reading

data from socketEventually, read

call returns an EOF

21

Page 21: Transport Layer  ECE544: Communication Networks-II, Spring 2013

Data transmission

Page 22: Transport Layer  ECE544: Communication Networks-II, Spring 2013

TCP: Byte-streamService: Byte-stream

Application reads or writes a stream of bytes to the transport

Issue: IP is packet-orientedSolution: TCP maintains a local buffer

Chop the stream into packets and transmit (sender)Coalesce data from packets to form a stream (receiver)

Page 23: Transport Layer  ECE544: Communication Networks-II, Spring 2013

TCP “Stream of Bytes” Service

By te 0

By te 1

By te 2

By te 3

By te 0

By te 1

By te 2

By te 3

Host A

Host B

By te 8 0

By te 8 0

24

Page 24: Transport Layer  ECE544: Communication Networks-II, Spring 2013

…Emulated Using TCP “Segments”

By te 0

By te 1

By te 2

By te 3

By te 0

By te 1

By te 2

By te 3

Host A

Host B

By te 8 0

TCP Data

TCP Data

By te 8 0

Segment sent when:1. Segment full (Max Segment Size),2. Not full, but times out, or3. “Pushed” by application

25

Page 25: Transport Layer  ECE544: Communication Networks-II, Spring 2013

TCP SegmentIP packet

No bigger than Maximum Transmission Unit (MTU)

E.g., up to 1500 bytes on an Ethernet link

TCP packetIP packet with a TCP header and data insideTCP header is typically 20 bytes long

TCP segmentNo more than Maximum Segment Size (MSS)

bytesE.g., up to 1460 consecutive bytes from the

stream

IP HdrIP Data

TCP HdrTCP Data (segment)

26

Page 26: Transport Layer  ECE544: Communication Networks-II, Spring 2013

Sequence NumberHost A

Host B

TCP Data

TCP Data

ISN (initial sequence number)

Sequence number = 1st byte

By te 8 1

27

Page 27: Transport Layer  ECE544: Communication Networks-II, Spring 2013

Reliable Delivery on a Lossy Channel With Bit Errors

Page 28: Transport Layer  ECE544: Communication Networks-II, Spring 2013

Challenges of Reliable Data Transfer

Over a perfectly reliable channel: Done

Over a channel with bit errorsReceiver detects errors and requests

retransmission

Over a lossy channel with bit errorsSome data missing, others corruptedReceiver cannot easily detect loss

Over a channel that may reorder packetsReceiver cannot easily distinguish loss vs. out-of-

order

30

Page 29: Transport Layer  ECE544: Communication Networks-II, Spring 2013

An AnalogyAlice and Bob are talking

What if Alice couldn’t understand Bob?Bob asks Alice to repeat what she said

What if Bob hasn’t heard Alice for a while?Is Alice just being quiet? Has she lost

reception?How long should Bob just keep on talking?Maybe Alice should periodically say “uh huh”… or Bob should ask “Can you hear me

now?”

31

Page 30: Transport Layer  ECE544: Communication Networks-II, Spring 2013

Take-Aways from the ExampleAcknowledgments from receiver

Positive: “okay” or “uh huh” or “ACK”Negative: “please repeat that” or “NACK”

Retransmission by the senderAfter not receiving an “ACK”After receiving a “NACK”

Timeout by the sender (“stop and wait”)Don’t wait forever without some

acknowledgment

32

Page 31: Transport Layer  ECE544: Communication Networks-II, Spring 2013

TCP Support for Reliable DeliveryDetect bit errors: checksum

Used to detect corrupted data at the receiver…leading the receiver to drop the packet

Detect missing data: sequence numberUsed to detect a gap in the stream of bytes... and for putting the data back in order

Recover from lost data: retransmissionSender retransmits lost or corrupted dataTwo main ways to detect lost packets

33

Page 32: Transport Layer  ECE544: Communication Networks-II, Spring 2013

TCP AcknowledgmentsHost A

Host B

TCP Data

TCP Data

ISN (initial sequence number)

Sequence number = 1st byte

ACK sequence number = next expected byte

34

Page 33: Transport Layer  ECE544: Communication Networks-II, Spring 2013

Automatic Repeat reQuest (ARQ)

ACK and timeoutsReceiver sends ACK

when it receives packet

Sender waits for ACK and times out

Simplest ARQ protocolStop and waitSend a packet, stop and

wait until ACK arrives

35

Time

Packet

ACKTim

eou

t

Sender Receiver

Page 34: Transport Layer  ECE544: Communication Networks-II, Spring 2013

Quick TCP Math• Initial Seq No = 501. Sender sends 4500 bytes

successfully acknowledged. Next sequence number to send is:

(A) 4501 (B) 5000 (C) 5001 (D) 5002

• Next 1000 byte TCP segment received. Receiver acknowledges with ACK number:

(A) 5001 (B) 6000 (C) 6001

36

Page 35: Transport Layer  ECE544: Communication Networks-II, Spring 2013

Flow Control:TCP Sliding Window

Page 36: Transport Layer  ECE544: Communication Networks-II, Spring 2013

Sliding Window: MotivationStop-and-wait is inefficient

Only one TCP segment is “in flight” at a time

Consider: 1.5 Mbps link with 50 ms round-trip-time (RTT)Assume segment size of 1 KB (8 Kbits)8 Kbits/segment at 50 msec/segment 160 KbpsThat’s 11% of the capacity of 1.5 Mbps link

39

Page 37: Transport Layer  ECE544: Communication Networks-II, Spring 2013

Sliding WindowAllow a larger amount of data “in flight”

Allow sender to get ahead of the receiver… though not too far ahead

Sending process Receiving process

Last byte ACKedLast byte sent

TCP TCP

Next byte expected

Last byte written Last byte read

Last byte received40

Page 38: Transport Layer  ECE544: Communication Networks-II, Spring 2013

Receiver BufferingReceive window size

Amount that can be sent without acknowledgmentReceiver must be able to store this amount of

data

Receiver tells the sender the windowTells the sender the amount of free space left

Window Size

OutstandingUn-ack’d data

Data OK to send

Data not OK to send yet

Data ACK’d

41

Page 39: Transport Layer  ECE544: Communication Networks-II, Spring 2013

TCP: Flow Control Flow Control

“Prevent sender from overrunning the capacity (buffer) of the receiver” Solution: Use adaptive receiver window size

Goal is to keep (C) – (A) < MaxRcvBuffer Every packet carries ACK and AdvertisedWindow

Sending Appl Receiving Appl

LastByteAcked (J) (K) LastByteSent

(I) LastByteWritten

(B) NextByteExpected

(C) LastByteRcvd

LastByteRead(A)TCP TCP

AdvertisedWindow = MaxRcvBuffer- ((NextByteExp-1)-LastByteRead)

LastByteSent (K) – LastByteAcked (J) <= AdvertisedWindowEffWin = AdvertisedWin - (LastByteSent-LastByteAcked)LastByteWritten – LastByteAcked <= MaxSendBuffer

Page 40: Transport Layer  ECE544: Communication Networks-II, Spring 2013

Optimizing Retransmissions

43

Page 41: Transport Layer  ECE544: Communication Networks-II, Spring 2013

Reasons for Retransmission

44

Packet

ACK

Tim

eou

t Packet

ACK

Tim

eou

t

Packet

Tim

eou

t

Packet

ACK

Tim

eou

t

Packet

ACK

Tim

eou

tPacket

ACK

Tim

eou

t

ACK lostDUPLICATE PACKET

Packet lost Early timeoutDUPLICATEPACKETS

Page 42: Transport Layer  ECE544: Communication Networks-II, Spring 2013

How Long Should Sender Wait?Sender sets a timeout to wait for an ACK

Too short: wasted retransmissionsToo long: excessive delays when packet lost

TCP sets timeout as a function of the RTTExpect ACK to arrive after an “round-trip

time”… plus a fudge factor to account for queuing

But, how does the sender know the RTT?Running average of delay to receive an ACK

45

Page 43: Transport Layer  ECE544: Communication Networks-II, Spring 2013

TCP TimeoutIssue: RTT in a wide area network varies

substantiallySolution: Adaptive Timeout

Original Algorithm: EstimatedRTT = x EstimatedRTT + (1-) x

SampleRTT

Timeout = β x EstimatedRTT (β = 2)

Problem Does not distinguish whether the ACK is for original

transmission or retransmissionConstant β is not good.

Assumes constant variance

10

Page 44: Transport Layer  ECE544: Communication Networks-II, Spring 2013

TCP TimeoutKarn/Partridge Algorithm

Whenever TCP retransmits a segment, it stops taking samples of the RTTOnly measure SampleRTT for segments that have been

sent only onceEach time TCP retransmits, set the next timeout to be

twice the last timeoutRelieves congestion

Jacobson/Karels Algorithm: Adaptive variance (uses mean variance)

Difference = SampleRTT - EstimatedRTTEstimatedRTT = EstimatedRTT + ( x Difference) → (same as

in original)Deviation = Deviation + (|Difference|- Deviation)Timeout = x EstimatedRTT + x Deviation(default: set = 1 and = 4 )

10

Page 45: Transport Layer  ECE544: Communication Networks-II, Spring 2013

TCP Deadlock TCP Deadlock

receiver advertises a window size of 0, the sender stops sending data

the window size update from the receiver is lost

To solve it:the sender starts the persist timer when

AdvertisedWindow = 0When the persist timer expires, the sender

sends a small packet

Page 46: Transport Layer  ECE544: Communication Networks-II, Spring 2013

Triggering TransmissionWhen to transmit a segment:

small segments subject to large overheadReach max segment size (MSS): the size

of the largest segment TCP can send without causing the local IP to fragmentMSS = local MTU – IP & TCP header

The sending process explicitly ask the TCP to transmit, “push”

Page 47: Transport Layer  ECE544: Communication Networks-II, Spring 2013

Congestion

When the network cannot support the sender’s rateQueues at the network elements overflow

Source1

Source2

Source3

Dest2

Dest1

Even with flow control packets might not reach the

destination

Page 48: Transport Layer  ECE544: Communication Networks-II, Spring 2013

Congestion Control vs. Flow ControlCongestion Control

Mechanism to prevent sender from overrunning the capacity of the network When network is the bottleneck

Flow Control Mechanism to prevent sender from

overrunning the capacity of the receiverWhen receiver is the bottleneck

Page 49: Transport Layer  ECE544: Communication Networks-II, Spring 2013

Congestion Control: Design ApproachMaintain another window at the sender called

CongestionWindow (cwnd)CongestionWindow is the max number of packets allowed

in the networkNumber of unACKed packets at the sender.

Key: How to calculate congestion window (cwnd)Various approaches possibleTCP estimates it based on observed packet losses

Assumes packet loss as indication of congestion

Since we don’t know whether the network or the receiver is the bottleneck MaxWindow = MIN(CongestionWindow,

AdvertisedWindow)EffectiveWin = MaxWindow – (LastByteSent –

LastByteAcked)

Page 50: Transport Layer  ECE544: Communication Networks-II, Spring 2013

Congestion Avoidance:

(AIMD) If no congestion in the network (increase

conservatively)Increase the congestion window additively every RTT

If congestion in the network (decrease aggressively)Decrease the congestion window multiplicatively,

immediately

How is congestion detected? Estimated (more later)

Every RTT w = w + 1w = cwnd in segments

Every ACK reception w = w + 1/ww = cwnd in segments

Every ACK reception cwnd = cwnd +

MSS*(MSS/cwnd)cwnd in bytes

cwnd = cwnd/2 cwnd in bytes

Page 51: Transport Layer  ECE544: Communication Networks-II, Spring 2013

Congestion Avoidance: (AIMD)

TCP’s saw tooth patternIssues with additive increase

takes too long to ramp up a connection from the beginning

The entire advertised window may be reopened when a lost packet retransmitted and a single cumulative ACK is received by the sender

TimeCongest

ionW

indow

Siz

e

Startup time

Page 52: Transport Layer  ECE544: Communication Networks-II, Spring 2013

TCP “Slow Start”: To start quickly!

Maintain another variable slow start threshold (ssthresh) Last known stable rate If (cwnd > ssthresh)

State = congestion avoidanceElse

State = slow start In Slow start

Increase the congestion window exponentially every RTT

Key: How is ssthresh calculated?

Every ACK reception w = w + 1w = cwnd in segments

Every ACK reception cwnd = cwnd + MSScwnd in bytes

Page 53: Transport Layer  ECE544: Communication Networks-II, Spring 2013

TCP: Congestion Detection and Retransmit

Loss of packet indicates congestionTimer Timeouts (No ACK)

Set according to Jacobson/Karels algorithmOn timer timeout

ssthresh = max(2*MSS, effwin/2); cwnd = MSSNotice this will cause TCP to go into slow start

Issue: takes a long time to detect a packet lossAffects throughput

Any other quicker way of detecting a packet loss?

Page 54: Transport Layer  ECE544: Communication Networks-II, Spring 2013

Fast Retransmit

Observation: A series of duplicate ACKs might mean a packet loss

SolutionEvery time receiver

receives a packet (out-of-order), sends a duplicate ACK

Sender retransmit the missing packet after it receives some number of duplicate ACKs (e.g. 3 duplicate ACKs)

Fast Retransmit does not replace timeouts

Issue: Reduces latency (early retransmit) but still incurs loss in throughput (slow start after packet loss )

ACK 1

ACK 2

ACK 2

ACK 2

ACK 2

ACK 6

PKT 1PKT 2

PKT 4

PKT 5PKT 6

PKT 3Retran

PKT 3

Page 55: Transport Layer  ECE544: Communication Networks-II, Spring 2013

Fast Recovery

Transmit a packet for every ACK received till the retransmitted packet is ACK’dssthresh= (2*MSS,

cwdn/2); cwnd = sshthred + 3

On every ACK will the ACK of retransmitted packet cwnd = cwnd + 1

On reception of ACK of retransmitted packet Start congestion

avoidance instead of slow startcwnd = ssthresh

Page 56: Transport Layer  ECE544: Communication Networks-II, Spring 2013

Homework5.13 (3rd ed and 4th ed)5.165.285.345.39

Due 4/5