Post on 01-Jan-2016
description
End-to-End Protocols
OutlineSimple DemultiplexerReliable Byte-StreamRemote Procedure CallPerformance
End-to-End Protocols• Common end-to-end services
– guarantee message delivery– deliver messages in the same order they are sent– deliver at most one copy of each message– support arbitrarily large messages– support synchronization– allow the receiver to flow control the sender– support multiple application processes on each host
• Underlying best-effort network– drop messages– reorders messages– delivers duplicate copies of a given message– limits messages to some finite size– delivers messages after an arbitrarily long delay
Simple Demultiplexor (UDP)
• User Datagram Protocol (UDP) - Unreliable and unordered datagram service
• Adds multiplexing to allow multiple application processes on each host to share the network
• A port is the abstraction of the communication endpoints.– Use a <port/mailbox, host> pair to identify a process
– Endpoints identified by ports• servers have well-known ports – DNS:53, talk:517
• see /etc/services on Unix
Simple Demultiplexor (UDP)• A port is implemented by a message queue.• UDP has no flow control.• UDP header format
– Optional checksum: psuedo header + UDP header + data
– psuedo header: Protocol number, Source IP address, Destination IP address, and
UDP length field
– Verify that this message has
been delivered between the
correct two endpoints.
SrcPort DstPort
Checksum Length
Data
0 16 31
Reliable Byte-Stream (TCP)
OutlineConnection Establishment/TerminationSliding Window Revisited Flow ControlAdaptive Timeout
TCP Overview
• Transmission Control Protocol (TCP) is a reliable, connection-oriented, and byte-stream service.
• A byte-stream service– application writes bytes– TCP sends segments– application reads bytes
• TCP is a full-duplex protocol.• TCP supports a demultiplexing mechanism.
TCP Overview
Application process
Writebytes
TCPSend buffer
Segment Segment Segment
Transmit segments
Application process
Readbytes
TCPReceive buffer
…
… …
• Flow control: keep sender from overrunning receiver• Congestion control: keep sender from overrunning
network• TCP uses the sliding window algorithm.
Data Link Versus Transport• Potentially have many connections between
different hosts– need explicit connection establishment and termination
• Potentially different RTT– need adaptive timeout mechanism
• Potentially long delay in network– need to be prepared for arrival of very old packets
• Potentially different capacity at destination – need to accommodate different node capacity
• Potentially different network capacity– need to be prepared for network congestion
TCP Segment Format
• The packets exchanged between TCP peers are called segments.
• How does TCP decide that it has enough bytes to send a segment?– TCP maintains a variable, called the maximum segment
size (MSS), and it sends a segment as soon as it has collected MSS bytes from the sending process.
– TCP supports a push operation, and the sending process invokes this operation to effectively flush the buffer of unsent byte.
– The final trigger is a timer that periodically fires.
Segment Format
Options (variable)
Data
Checksum
SrcPort DstPort
HdrLen 0 Flags
UrgPtr
AdvertisedWindow
SequenceNum
Acknowledgment
0 4 10 16 31
TCP Header Format
• SrcPort: Source port, DstPort: Destination port• Acknowledgement, SequenceNum, and
AdvertisedWindow fields are all involved in TCP’s sliding window algorithm.
• The 6-bit Flags field is used to replay control information between TCP peers:– SYN, FIN: establish and terminate a TCP connection.– RESET, PUSH: push operation– URG: urgent data up to UrgPtr bytes– ACK: Acknowledgement
Segment Format (cont)• Each connection identified with 4-tuple:
– (SrcPort, SrcIPAddr, DsrPort, DstIPAddr)
• Sliding window + flow control– acknowledgment, SequenceNum, AdvertisedWinow
• Flags– SYN, FIN, RESET, PUSH, URG, ACK
• Checksum– pseudo header + TCP header + data
Sender
Data (SequenceNum)
Acknowledgment +AdvertisedWindow
Receiver
Three-Way Handshake
• The algorithm used by TCP to establish and terminate a connection is a called a three-way handshake.– A timer is scheduled for each of the first two segments.– The client and server select an initial starting sequence
number at random and have to exchange starting sequence numbers with each other at connection setup time.
– This is to protect against the chance that a segment from an early connection might interfere with a latter one.
• TCP can be specified in a state-transition diagram.
Connection Establishment and Termination
Active participant(client)
Passive participant(server)
SYN, SequenceNum = x
SYN + ACK, SequenceNum = y,
ACK, Acknowledgment = y + 1
Acknowledgment = x + 1
State Transition DiagramCLOSED
LISTEN
SYN_RCVD SYN_SENT
ESTABLISHED
CLOSE_WAIT
LAST_ACKCLOSING
TIME_WAIT
FIN_WAIT_2
FIN_WAIT_1
Passive open Close
Send/SYNSYN/SYN + ACK
SYN + ACK/ACK
SYN/SYN + ACK
ACK
Close/FIN
FIN/ACKClose/FIN
FIN/ACKACK + FIN/ACK Timeout after two segment lifetimes
FIN/ACK
ACK
ACK
ACK
Close/FIN
Close
CLOSED
Active open/SYN
Sliding Window
• TCP’s sliding window algorithm serves several purposes:– It guarantees the reliable delivery of data.
– It ensures that data is delivered in order.
– It enforces flow control between the sender and the receiver.
• In order to keep the sender from overrunning the receiver’s buffer, the receiver advertises a window size to the sender by specifying the AdvertisedWindow field in the TCP header.
Sliding Window Revisited
• Sending side– LastByteAcked < = LastByteSent
– LastByteSent < = LastByteWritten
– buffer bytes between LastByteAcked and LastByteWritten
Sending application
LastByteWritten
TCP
LastByteSentLastByteAcked
Receiving application
LastByteRead
TCP
LastByteRcvdNextByteExpected
• Receiving side– LastByteRead < NextByteExpected
– NextByteExpected < = LastByteRcvd +1
– buffer bytes between NextByteRead and LastByteRcvd
Flow Control• Send buffer size: MaxSendBuffer• Receive buffer size: MaxRcvBuffer• Receiving side
– LastByteRcvd - LastByteRead < = MaxRcvBuffer– AdvertisedWindow = MaxRcvBuffer - (LastByteRcvd - NextByteRead)
• Sending side– LastByteSent - LastByteAcked < = AdvertisedWindow– EffectiveWindow = AdvertisedWindow - (LastByteSent - LastByteAcked)
– LastByteWritten - LastByteAcked < = MaxSendBuffer– block sender if (LastByteWritten - LastByteAcked) + y > MaxSenderBuffer
• Always send ACK in response to arriving data segment• Persist when AdvertisedWindow = 0
Protection Against Wrap Around
• 32-bit SequenceNum
Bandwidth Time Until Wrap AroundT1 (1.5 Mbps) 6.4 hoursEthernet (10 Mbps) 57 minutesT3 (45 Mbps) 13 minutesFDDI (100 Mbps) 6 minutesSTS-3 (155 Mbps) 4 minutesSTS-12 (622 Mbps) 55 secondsSTS-24 (1.2 Gbps) 28 seconds
Keeping the Pipe Full
• 16-bit AdvertisedWindow
Bandwidth Delay x Bandwidth ProductT1 (1.5 Mbps) 18KBEthernet (10 Mbps) 122KBT3 (45 Mbps) 549KBFDDI (100 Mbps) 1.2MBSTS-3 (155 Mbps) 1.8MBSTS-12 (622 Mbps) 7.4MBSTS-24 (1.2 Gbps) 14.8MB
Adaptive Retransmission(Original Algorithm)
• Measure SampleRTT for each segment/ ACK pair• Compute weighted average of RTT
– EstRTT = x EstimatedRTT + x SampleRTT– where + = 1 between 0.8 and 0.9 between 0.1 and 0.2
• Set timeout based on EstRTT– TimeOut = 2 x EstRTT
Karn/Partridge Algorithm
• Do not sample RTT when retransmitting • Double timeout after each retransmission
Sender Receiver
Original transmission
ACK
Sam
pleR
TT
Retransmission
Sender Receiver
Original transmission
ACK
Sam
pleR
TT
Retransmission
Jacobson/ Karels Algorithm• New Calculations for average RTT• Diff = sampleRTT - EstRTT• EstRTT = EstRTT + ( 8 x Diff)• Dev = Dev + 8 ( |Diff| - Dev)
– where 8 is a factor between 0 and 1
• Consider variance when setting timeout value• TimeOut = x EstRTT + x Dev
– where = 1 and = 4
• Notes– algorithm only as good as granularity of clock (500ms on Unix)– accurate timeout mechanism important to congestion control (later)
TCP Extensions
• Implemented as header options• Store timestamp in outgoing segments• Extend sequence space with 32-bit timestamp
(PAWS)• Shift (scale) advertised window
Remote Procedure Call
OutlineBasicsProtocol StackPresentation Formatting
Remote Procedure Call Basics
• Problems with sockets The read/write (input/output) mechanism is
used in socket programming. Socket programming is different from
procedure calls which we usually use. To make computing transparent from locations,
input/output is not the best way.
Remote Procedure Call Basics
• A procedure call is a standard abstraction in local computation.
• Procedure calls are extended to distributed computation in Remote Procedure Call (RPC) as shown in Figure 5.11. A caller invokes execution of procedure in the
callee via the local stub procedure. The implicit network programming hides all
network I/O code from the programmer. Objectives are simplicity and ease of use.
Remote Procedure Call Basics
• The concept is to provide a transparent mechanism that enables the user to utilize remote services through standard procedure calls.
• Client sends request, then blocks until a remote server sends a response (reply).
• Advantages: user may be unaware of remote implementation (handled in a stub in library); uses standard mechanism.
• Disadvantages: prone to failure of components and network; different address spaces; separate process lifetimes.
RPC Components • Protocol Stack
– BLAST: fragments and reassembles large messages– CHAN: synchronizes request and reply messages – SELECT: dispatches request to the correct process
• Stubs Caller(client)
Clientstub
RPCprotocol
Returnvalue
Arguments
ReplyRequest
Callee(server)
Serverstub
RPCprotocol
Returnvalue
Arguments
ReplyRequest
RPC Timeline
Client Server
Request
Reply
Computing
Blocked
Blocked
Blocked
SunRPC
• IP implements BLAST-equivalent– except no selective retransmit
• SunRPC implements CHAN-equivalent – except not at-most-once
• UDP + SunRPC implement SELECT-equivalent – UDP dispatches to program (ports bound to programs)– SunRPC dispatches to procedure within program
IP
ETH
SunRPC
UDP
Sun RPC
• It is designed for client-server communication over Sun NFS network file system.
• UDP or TCP can be used. If UDP is used, the message length is restricted to 64 KB, but 8 - 9 KB in practice.
• The Sun XDR is originally intended for external data representation.
• Valid data types supported by XDR include int, unsigned int, long, structure, fixed array, string (null terminated char *), binary encoded data (for other data types such as lists).
Sun XDR
• A program number and a version number are supplied.
• The procedure number is used as a procedure definition.
• Single input parameter and output result are being passed.
Files interface in Sun XDRconst MAX = 1000;typedef int FileIdentifier;typedef int FilePointer;typedef int Length;struct Data {
int length;char buffer[MAX];
};struct writeargs {
FileIdentifier f;FilePointer position;Data data;
};
struct readargs {FileIdentifier f;FilePointer position;Length length;
};
program FILEREADWRITE { version VERSION {
void WRITE(writeargs)=1; 1Data READ(readargs)=2; 2
}=2;} = 9999;
Sun RPC
• The interface compiler rpcgen is used to generate the following from interface definition. client stub procedures server main procedure, dispatcher and server stub
procedures XDR marshalling and unmarshalling procedures used
by dispatcher and client, server stub procedures.
• Binding: portmapper records program number, version number,
and port number. If there are multiple instance running on different
machines, clients make multicast remote procedure calls by broadcasting them to all the port mappers.
RPC Interface Compiler
Example (Sun RPC)
• long sum(long) example client localhost 10 result: 55
• Need RPC specification file (sum.x) defines procedure name, arguments & results
• Run (interface compiler) rpcgen sum.x generates sum.h, sum_clnt.c, sum_xdr.c, sum_svc.c sum_clnt.c & sum_svc.c: Stub routines for client &
server sum_xdr.c: XDR (External Data Representation) code
takes care of data type conversions
RPC XDR File (sum.x)
struct sum_in { long arg1;};struct sum_out { long res1;};program SUM_PROG { version SUM_VERS { sum_out SUMPROC(sum_in) = 1; /* procedure number =
1*/ } = 1; /* version number = 1 */} = 0x32123000; /* program number */
Example (Sun RPC)
• Program-number is usually assigned as follows: 0x00000000 - 0x1fffffff defined by SUN 0x20000000 - 0x3fffffff defined by user 0x40000000 - 0x5fffffff transient 0x60000000 - 0xffffffff reserved
RPC Client Code (rsum.c)
#include ''sum.h''
main(int argc, char* argv[]) {
CLIENT* cl; sum_in in; sum_out *outp;
// create RPC client handle; need to know server's address
cl = clnt_create(argv[1], SUM_PROG, SUM_VERS, ''tcp'');
in.arg1 = atol(argv[2]); // number to be squared
// Call RPC; note convention of RPC function naming
if ( (outp = sumproc_1(&in, cl)) == NULL)
err_quit(''%s'', clnt_sperror(cl, argv[1]);
printf(''result: %ld\n'', outp->res1);
}
RPC Server Code (sum_serv.c)
#include "sum.h"sum_out* sumproc_1_svc (sum_in *inp, struct svc_req
*rqstp){ // server function has different name than client call static sum_out out; // why is this static? int i; out.res1 = inp->arg1; for (i = inp->arg1 - 1; i > 0; i--) out.res1 += i; return(&out);}// server's main() is generated by rpcgen
Compilation Linking
rpcgen sum.x
cc -c rsum.c -o rsum.o
cc -c sum_clnt.c -o sum_clnt.o
cc -c sum_xdr.c -o sum_xdr.o
cc -o client rsum.o sum_clnt.o sum_xdr.o
cc -c sum_serv.c -o sum_serv.o
cc -c sum_svc.c -o sum_svc.o
cc -o server sum_serv.o sum_svc.o sum_xdr.o
Internal Details of Sun RPC
• Initialization Server runs: register RPC with port mapper on server host
(rpcinfo –p) Client runs: clnt_create contacts server's port mapper and
establishes TCP connection with server (or UDP socket)
• Client Client calls local procedure (client stub: sumproc_1), that
is generated by rpcgen. Client stub packages arguments, puts them in standard format (XDR), and prepares network messages (marshaling).
Network messages are sent to remote system by client stub.
Network transfer is accomplished with TCP or UDP.
Internal Details of Sun RPC
• Server Server stub (generated by rpcgen) unmarshals
arguments from network messages. Server stub executes local procedure (sumproc_1_svc) passing arguments received from network messages.
When server procedure is finished, it returns to server stub with return values.
Server stub converts return values (XDR), marshals them into network messages, and sends them back to client
• Back to Client Client stub reads network messages from kernel Client stub returns results to client function
Details of RPC
SunRPC Header Format
• XID (transaction id) is similar to CHAN’s MID
• Server does not remember last XID it serviced
• Problem if client retransmits request while reply is in transit
Data
MsgType = CALL
XID
RPCVersion = 2
Program
Version
Procedure
Credentials (variable)
Verifier (variable)
0 31
Data
MsgType = REPLY
XID
Status = ACCEPTED
0 31