1 CSE 524: Lecture 12 Transport Layer (Part 2). 2 Administrative Exam –Still being graded –Will...
-
Upload
osborn-banks -
Category
Documents
-
view
216 -
download
0
description
Transcript of 1 CSE 524: Lecture 12 Transport Layer (Part 2). 2 Administrative Exam –Still being graded –Will...
1
CSE 524: Lecture 12
Transport Layer (Part 2)
2
Administrative
• Exam– Still being graded– Will be returned on Wednesday guaranteed
3
Transport Layer
• Last class– Transport layer functions
• This class– Specific transport layers
4
Specific transport layers
• UDP– unreliable (“best-effort”), – unordered – unicast or multicast delivery
• TCP– reliable– in-order– unicast
• SCTP (will not cover in class)– See http://www.ietf.org/rfc/rfc2960.txt– reliable– optional ordering– unicast
5
TL: UDP and Transport Layer Functions• Demux to upper layer
– UDP port field• Quality of service
– none• Security
– none• Delivery semantics
– Unordered– Unicast or multicast
• Flow control– none
• Congestion control– none
• Reliable data transfer– none, but data integrity provided by checksum
6
TL: UDP: User Datagram Protocol
• http://www.rfc-editor.org/rfc/rfc768.txt
• “no frills,” “bare bones” Internet transport protocol
• “best effort” service, UDP segments may be:– lost– delivered out of order to
app• connectionless:
– no handshaking between UDP sender, receiver
– each UDP segment handled independently of others
Why is there a UDP?• no connection
establishment (which can add delay)
• simple: no connection state at sender, receiver
• small segment header• no congestion control:
UDP can blast away as fast as desired
7
TL: UDP: more
• often used for streaming multimedia apps– loss tolerant– rate sensitive
• other UDP uses (why?):– DNS– SNMP
• reliable transfer over UDP: add reliability at application layer– application-specific error
recovery!– many applications re-
implement reliability over UDP to bypass TCP
– new transport protocols?
source port # dest port #32 bits
Applicationdata
(message)
UDP segment format
length checksumLength, in
bytes of UDPsegment,including
header
8
TL: UDP checksum
Sender:• treat segment contents as
sequence of 16-bit integers• checksum: addition (1’s
complement sum) of segment contents
• sender puts checksum value into UDP checksum field
• similar to IP’s header checksum
Receiver:• compute checksum of received
segment• check if computed checksum
equals checksum field value:– NO - error detected– YES - no error detected. But
maybe errors nonethless? More later ….
Goal: detect “errors” (e.g., flipped bits) in transmitted segment
9
TL: TCP and Transport Layer Functions
• Demux to upper layer• Quality of service• Security• Delivery semantics• Flow control• Congestion control• Reliable data transfer
10
TL: TCP Overview RFCs: 793, 1122, 1323, 2018, 2581
• full duplex data:– bi-directional data flow in same
connection– MSS: maximum segment size
• connection-oriented: – handshaking (exchange of
control msgs) init’s sender, receiver state before data exchange
– protocol implemented at ends (“fate-sharing”)
• flow and congestion controlled:– sender will not overwhelm
receiver or network
• point-to-point:– one sender, one receiver
• reliable, in-order byte steam:– no “message boundaries”
• pipelined:– TCP congestion and flow control
set window size• send & receive buffers
socketdoor
T C Psend buffer
TC Prece ive buffer
socketdoor
segm en t
applica tionwrites data
applicationreads data
11
TL: TCP header
source port # dest port #32 bits
applicationdata
(variable length)
sequence numberacknowledgement
numberrcvr window sizeptr urgent datachecksum
FSRPAUheadlen
notused
Options (variable length)
URG: urgent data (generally not used)
ACK: ACK #valid
PSH: push data now(generally not used)
RST, SYN, FIN:connection estab(setup, teardown
commands)
# bytes rcvr willingto accept
countingby bytes of data(not segments!)
Internetchecksum
(as in UDP)
12
TL: TCP connections
• TCP sender, receiver establish “connection” before exchanging data segments– initialize TCP variables:
• Initial sequence #s• Buffers, flow control info (e.g. RcvWindow)• Window scaling
• client: connection initiator• server: contacted by client• Java API Socket clientSocket = new Socket("hostname","port#”);
Socket connectionSocket = welcomeSocket.accept();
13
TL: TCP connections
• Three way handshake:– Step 1: client end system sends TCP SYN control segment to server
• specifies initial seq #• should be random to prevent spoofing ( http://www.rfc-editor.org/rfc
/rfc1948.txt )
– Step 2: server end system receives SYN, replies with SYNACK control segment• ACKs received SYN• allocates buffers• specifies server-> receiver initial seq. #
– Step 3: client receives SYNACK control segment, replies with ACK and potentially data
• ACKs received SYNACK• goes to established state
14
TL: TCP Connection Establishment
• A and B must agree on initial sequence number selection
• 3-way handshake
A B
SYN + Seq ASYN+ACK-A + Seq B
ACK-B
15
TL: TCP Sequence Number Selection
• Why not simply chose 0?• Must avoid overlap with earlier incarnation• Client machine seq #0, initiates connection to server
with seq #0.– Client sends one byte and machine crashes– Client reboots and initiates connection again– Server thinks new incarnation is the same as old connection
16
TL: TCP Sequence Number Selection
• Why is selecting a random ISN Important?• Suppose machine X selects ISN based on predictable sequence• Fred has .rhosts to allow login to X from Y• Evil Ed attacks
– Disables host Y – denial of service attack– Make a bunch of connections to host X– Determine ISN pattern a guess next ISN– Fake pkt1: [<src Y><dst X>, guessed ISN]– Fake pkt2: desired command– Attack popularized by K. Mitnick
17
TL: TCP ISN selection and spoofing attacks
Ed
Y
X
.rhosts Y
1. Flood continuously
3. TCP SYNACK ACK spoofed Y ISN Send X ISN PACKET DROPPED!
2. Spoof TCP SYN from YWith spoofed Y ISN 6. Real acks
dropped so Ydoes not resetconnection4. Send ACK with guess of X’s ISN
as if you received TCP SYNACK
5. Send pre-canned rlogin/rsh messages rsh echo “Ed” >> .rhostsspoof acknowledgements
Ed7. Door now open, rlogin to X from Ed directly
18
TL: TCP connection setup
CLOSED
SYNSENT
SYNRCVD
ESTAB
LISTEN
active OPENcreate TCBSnd SYN
create TCBpassive OPEN
delete TCBCLOSE
delete TCBCLOSE
snd SYNAPP SEND
snd SYN ACKrcv SYN
Send FINCLOSE
rcv ACK of SYNSnd ACK
Rcv SYN, ACK
rcv SYNsnd ACK
19
TL: TCP connections
Data transfer for established connections using sequence numbers and sliding windows with cumulative ACKs
Seq. #’s:– byte stream “number” of first byte
in segment’s dataACKs:
– seq # of next byte expected from other side
– cumulative ACK– duplicate acks sent when out-of-
order packet receivedSee web traceJava API
connectionSocket.receive();clientSocket.send();
Host A Host B
Seq=42, ACK=79, data = ‘C’
Seq=79, ACK=43, data = ‘C’
Seq=43, ACK=80
Usertypes
‘C’
host ACKsreceipt
of echoed‘C’
host ACKsreceipt of
‘C’, echoesback ‘C’
timesimple telnet scenario
20
TL: TCP connectionsClosing a connection:
Client-initiated close (reverse process for server-initiated close)
Java API: clientSocket.close();
Step 1: client end system sends TCP FIN control segment to server
Step 2: server receives FIN, replies with ACK. Closes connection, sends FIN.
client
FIN
server
ACK
ACK
FIN
close
close
closed
timed
wai
t
21
TL: TCP connectionsStep 3: client receives FIN, replies
with ACK.
– Enters “timed wait” - will respond with ACK to received FINs
Step 4: server, receives ACK. Connection closed.
Note: with small modification, can handle simultaneous FINs.
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
timed
wai
tclosed
22
TL: TCP Half-Close
Sender ReceiverFIN
FIN-ACK
FIN
FIN-ACK
Data write
Data ack
23
TL: TCP Connection Tear-down
CLOSING
CLOSEWAIT
FINWAIT-1
ESTAB
TIME WAIT
snd FINCLOSE
send FINCLOSE
rcv ACK of FIN
LAST-ACK
CLOSED
FIN WAIT-2
snd ACKrcv FIN
delete TCBTimeout=2msl
send FINCLOSE
send ACKrcv FIN
snd ACKrcv FIN
rcv ACK of FIN
snd ACKrcv FIN+ACK
rcv ACK
24
TL: Time Wait Issues
• Cannot close connection immediately after receiving FIN– What if a new connection restarts and uses same sequence
number? • Web servers not clients close connection first
– Established Fin-Waits Time-Wait Closed– Why would this be a problem?
• Time-Wait state lasts for 2 * MSL– MSL is should be 120 seconds (is often 60s)– Servers often have order of magnitude more connections in
Time-Wait
25
TL: TCP connections
TCP clientlifecycle
TCP serverlifecycle
26
TL: TCP Demux to upper layer
multiplexing/demultiplexing:• based on sender, receiver port
numbers, IP addresses– source, dest port #s in each
segment– recall: well-known port numbers
for specific applications– Servers wait on well known ports
(/etc/services)
gathering data from multiple app processes, enveloping data with header (later used for demultiplexing)
source port # dest port #32 bits
applicationdata
(message)
other header fields
TCP/UDP segment format
Multiplexing:
27
TL: TCP Demux to upper layer
host A server Bsource port: xdest. port: 23
source port:23dest. port: x
port use: simple telnet app
Web clienthost A
Webserver B
Web clienthost C
Source IP: CDest IP: B
source port: x
dest. port: 80
Source IP: CDest IP: B
source port: y
dest. port: 80
port use: Web server
Source IP: ADest IP: B
source port: x
dest. port: 80
28
TL: TCP Flow control
• TCP is a sliding window protocol– For window size n, can send up to n bytes without
receiving an acknowledgement – When the data is acknowledged then the window slides
forward• Each packet advertises a window size
– Indicates number of bytes the receiver has space for• Original TCP always sent entire window
– Congestion control now limits this
29
TL: TCP Flow control
receiver: explicitly informs sender of (dynamically changing) amount of free buffer space – RcvWindow field
in TCP segmentsender: keeps the amount
of transmitted, unACKed data less than most recently received RcvWindow
sender won’t overrun
receiver’s buffers bytransmitting too
much, too fast
flow control
receiver buffering
RcvBuffer = size or TCP Receive Buffer
RcvWindow = amount of spare room in Buffer
30
TL: TCP Flow control
• What happens if window is 0?– Receiver updates window when application reads data– What if this update is lost?
• Deadlock
• TCP Persist timer– Sender periodically sends window probe packets– Receiver responds with ACK and up-to-date window
advertisement
31
TL: TCP flow control enhancements
• Problem: (Clark, 1982)– If receiver advertises small increases in the receive window
then the sender may waste time sending lots of small packets• What happens if window is small?
– Small packet problem known as “Silly window syndrome”• Receiver advertises one byte window• Sender sends one byte packet (1 byte data, 40 byte header = 4000%
overhead)
32
TL: TCP flow control enhancements
• Solutions to silly window syndrome• Clark (1982)
– receiver avoidance– prevent receiver from advertising small windows– increase advertised receiver window by min(MSS, RecvBuffer/2)
• Nagle’s algorithm (1984)– sender avoidance– prevent sender from unnecessarily sending small packets– http://www.rfc-editor.org/rfc/rfc896.txt
• “Inhibit the sending of new TCP segments when new outgoing data arrives from the user if any previously transmitted data on the connection remains unacknowledged”
• Allow only one outstanding small (not full sized) segment that has not yet been acknowledged• Works for idle connections (no deadlock)• Works for telnet (send one-byte packets immediately)• Works for bulk data transfer (delay sending)
33
TL: TCP reliable data transfer
• Segment integrity• Acknowledgement generation• Retransmission
34
TL: TCP RDT segment integrity
• Checksum included in header• Is it sufficient to just checksum the packet contents?• No, need to ensure correct source/destination
– Pseudoheader – portion of IP hdr that are critical– Checksum covers Pseudoheader, transport hdr, and packet
body– Layer violation, redundant with parts of IP checksum
35
TL: TCP RDT acks and timeouts
• TCP’s reliable data transfer approach– Cumulative acknowledgements
• Receiver sends back the byte number it expects to receive next• Out of order packets generate duplicate acknowledgements
– Receive 1, Ack 2– Receive 4, Ack 2– Receive 3, Ack 2– Receive 2, Ack 5
– Retransmissions• Sender sends segment and sets a timer• Waits for an acknowledgement indicating segment was received
– Send 1– Wait for Ack 2– No Ack 2 and timer expires– Send 1 again
36
TL: TCP RDT acks and timeouts
simplified sender, assuming
waitfor
event
waitfor
event
event: data received from application above
event: timer timeout for segment with seq # y
event: ACK received,with ACK # y
create, send segment
retransmit segment
ACK processing
•one way data transfer•no flow, congestion control
37
TL: TCP RDT acks and timeouts
00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) { 04 switch(event) 05 event: data received from application above 06 create TCP segment with sequence number nextseqnum 07 start timer for segment nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event: timer timeout for segment with sequence number y 11 retransmit segment with sequence number y 12 compute new timeout interval for segment y 13 restart timer for sequence number y 14 event: ACK received, with ACK field value of y 15 if (y > sendbase) { /* cumulative ACK of all data up to y */ 16 cancel all timers for segments with sequence numbers < y 17 sendbase = y 18 } 19 else { /* a duplicate ACK for already ACKed segment */ 20 increment number of duplicate ACKs received for y 21 if (number of duplicate ACKS received for y == 3) { 22 /* TCP fast retransmit */ 23 resend segment with sequence number y 24 restart timer for segment y 25 } 26 } /* end of loop forever */
SimplifiedTCPsender
38
TL: TCP delayed acknowledgements
• Problem:– In request/response programs, you send separate ACK and Data packets
for each transaction• Delay ACK in order to send ACK back along with data
• Solution:– Don’t ACK data immediately
• Wait 200ms (must be less than 500ms – why?)• Must ACK every other packet• Must not delay duplicate ACKs
– Without delayed ACK: 40 byte ack + data packet– With delayed ACK: data packet includes ACK– See web trace example– Extensions for asymmetric links
• See later part of lecture
39
TL: TCP ACK generation [RFC 1122, RFC 2581]
Event
in-order segment arrival, no gaps,everything else already ACKed
in-order segment arrival, no gaps,one delayed ACK pending
out-of-order segment arrivalhigher-than-expect seq. #gap detected
arrival of segment that partially or completely fills gap
TCP Receiver action
delayed ACK. Wait up to 200msfor next segment. If no next segment,send ACK
immediately send singlecumulative ACK
send duplicate ACK, indicating seq. #of next expected byte
immediate ACK if segment startsat lower end of gap
40
TL: TCP retransmission
• Wait at least one RTT before retransmitting packet• Importance of accurate RTT estimators:
– Estimator too low unneeded retransmissions– Estimator too high poor throughput, slow reaction to
segment loss• RTT estimator must adapt to change in RTT
– But not too fast, or too slow!• Backing off the retransmission timeout
– Exponential backoff– Double retransmission timer interval after every loss until
successful retransmission
41
TL: TCP retransmission scenarios
Host A
Seq=92, 8 bytes data
ACK=100
losstimeo
ut
time lost ACK scenario
Host B
X
Seq=92, 8 bytes data
ACK=100
Host A
Seq=100, 20 bytes data
ACK=100
Seq=
92 ti
meo
uttime premature timeout,
cumulative ACKs
Host B
Seq=92, 8 bytes data
ACK=120
Seq=92, 8 bytes data
Seq=
100
timeo
ut
ACK=120
42
TL: Initial Round-trip Estimator
• Round trip times exponentially averaged:
– Recommended value for x: 0.1-0.2• 0.125 for most TCP’s
– Influence of given sample decreases exponentially fast
• Retransmit timer set to RTT, where = 2– Every time timer expires, RTO exponentially backed-off– Like Ethernet
• Not good at preventing spurious timeouts
EstimatedRTT = (1-x)*EstimatedRTT + x*SampleRTT
43
TL: Jacobson’s Retransmission Timeout
• Key observation:– At high loads round trip variance is high– Need larger safety margin with larger variations in RTT
• Solution:– Base RTO value on RTT and standard deviation (RRTT)
44
TL: Jacobson’s Retransmission Timeout
EstimatedRTT = (1-x)*EstimatedRTT + x*SampleRTT
Setting the timeout• EstimtedRTT plus “safety margin”• large variation in EstimatedRTT -> larger safety margin
Timeout = EstimatedRTT + 4*Deviation
Deviation = (1-x)*Deviation + x*|SampleRTT-EstimatedRTT|
45
TL: Retransmission Ambiguity
A B
ACK
SampleRTT
Original transmission
retransmission
RTO
A BOriginal transmission
retransmissionSampleRTT
ACKRTOX
46
TL: Karn’s algorithm
• Accounts for retransmission ambiguity• If a segment has been retransmitted:
– Don’t count RTT sample on ACKs for this segment– Keep backed off time-out for next packet– Reuse RTT estimate only after one successful transmission
47
TL: Timer Granularity
• Many TCP implementations set RTO in multiples of 200,500,1000ms
• Why?– Avoid spurious timeouts – RTTs can vary quickly due to
cross traffic– Make timers interrupts efficient