Campbell,Brice RShah,Chirag Pravin Eidschun,John S Gilley,Michael W Keys,Adam K

Campbell,Brice R Shah,Chirag Pravin Eidschun,John S Gilley,Michael W Keys,Adam K Little,James E McLendon,Elizabeth Ann Shah,Sunil Rajesh Ward,Randy W

Elliott,Stacey DeWitt George,Paul Iychettira,Prem Belliappa Kikkeri,Nikhil Divakar Neelakantan,Madankumar Pathak, Shomik Tridib Raghunathan,Nithya Reddy, Karri Baskar Srinivas Saranu,Dharmendra ADD Srivastava, Abhinay Wen,Jianshou

If your name is not on the listleave the room immediately.The registrar will have to updateyour status.

11 states in TCP state machine

CLOSED

LISTEN

SYN_RCVD SYN_SENT

ESTABLISHEDCLOSE_WAIT

LAST_ACKFIN_WAIT1 CLOSING

FIN_WAIT2 TIME_WAIT

CLIENT TRANSITIONS BLUE

SERVER TRANSITIONS red

The previous state transition diagram provides for the possibility of simultaneous opens and closes. These will be discussed in detail at a later date. (possible but rare).

–If the application calls ‘close’ before eof (active close) the transition is to FIN_WAIT1.

–If the app receives a FIN in the ‘Established’ state (passive close) the transition is to CLOSE_WAIT state.

These states are displayed by netstat. A networking tool which we will use in the class and assignments.

ACK to client can be piggybacked with servers reply. (diagram pg 39).

TIME_WAIT state.

–End performing active close goes through this state.

–Stays in this state for 2MSL (Maximum Segment Lifetime).

–Remember that in IP (datagram) we have TTL, in TCP we have MSL (Segment).

–RFC 1122 calls for 2 minutes

–BSD derived implementations have used 30 seconds.

•Therefore MSL is between 1 and 4 minutes

–MSL is the max amount of time that a packet can live in the Internet.

–Scenario;

•A connection is made, a packet is wandering in a loop and TCP times out and resends, only to have the original packet appear (< MSL seconds later).

•What to do with this ‘wandering duplicate’?

• Call it Ishmael?

TIME_WAIT exists to allow reliable full-duplex and to allow old duplicates to die on the net (Ishmael doesn’t make it out of the desert)..

Scenario 2: Assume in TCP termination that the final ACK is lost. Therefore Server will resend the final FIN.

But client must maintain state information allowing it to resend the final ACK (if not then a RST would be sent in response to the FIN).

This is why the end that performs the active close is the end that performs the TIME_WAIT state.

Second reason for TIME_WAIT state.

Assume a TCP between 192.12.34.23 port 48 and 34.210.123.78 port 24.

Connection is closed and sometime later reopened (incarnation)

What if wandering duplicates from first connection reach at time of second connection?

Therefore TCP will not allow an incarnation until 2*MSL seconds have passed. This means any wanderers have been lost.

• One exception (with Comp Sci & Physics there are always exceptions); in BSD a new incarnation of a connection in TIME_WAIT will be allowed if the arriving SYN has a sequence number > than the ending sequence number from the previous incarnation.

Concurrent TCP Server (child spawned to handle each new connection); employs the Unix 'fork()'.

If an IP address is specified with an * then the following obtains:

The server cannot specify a list of multiple addresses Therefore the wildcard allows the server to specify a one-

or-any set of addresses on which it wants to receive requests for connections.

When server receives and accepts client connection, it forks and child handles the client. (uses ephemeral port number).

This means that TCP cannot demultiplex based simply on port numbers. Instead all 4 elements in the socket pair must be examined to determine where to route a segment.

UDP - User Datagram Protocol UDP is a simple transport layer protocol - RFC 768 UDP writes a datagram to a UDP socket which is

encapsulated as an IPv4 datagram. UDP is unreliable. UDP is truly connectionless.

TCP - Transmission Control Protocol.TCP is described in RFC 793.TCP provides connections between clients and servers.TCP is reliableTCP sequences the data.TCP provides flow control - uses windows.TCP is full-duplex

PORT NUMBERS Well known port numbers 0 - 1023

Registered port numbers 1024 - 49151.• NOT controlled by IANA but registered and maintained as a

convenience to the community.• When possible both UDP and TCP use the same registered port

number for the same service.• Example: BEA EJB server uses port 7001 and 7002 (ssl)

Emphemeral port numbers 49152 - 65535. Emphemeral port numbers are used to setup transitory

client-server connections.

The various port numbers are used by TCP to distinguish between the concurrent processes using TCP services at any given time.

Remember that the socket pair for any TCP connection is a four tuple that defines both endpoints of the TCP connection; Local TCP port, local TCP address, foreign TCP port, foreign TCP address.

Returning to the server example of Class 2 where a child is spawned to handle each new connection.

The connected socket pair will use the same local port as the listening socket. (and on a multihomed server the local address will be filled in).

When a second client process requests a connection with the same server, TCP on the client will assign a new ephemeral port.

Therefore on the server the 4-tuple is now different. NB: that TCP CANNOT demulitplex by examining only the destination port number.

A multihomed server is one that has multiple IP addresses.

Buffer sizes and limitations. Maximum size of IPv4 datagram is 65535 bytes including

the header. Many networks have an MTU (maximum transmission unit)

which can be dictated by the hardware. Ethernet MTU is 1500 bytes.

Smallest MTU in the path between the hosts is called the path MTU (usually the ethernet MTU in the current era).

If the size of an IP datagram exceeds the link MTU fragmentation is performed.

Fragments are never reassembled until reaching the destination.

In IPv4 both hosts and routers can perform fragmentation.

Buffer Sizes and limitations (continued) If the DF (Don't Fragment) bit in the IPv4 header is set then

if the size of the datagram exceeds the MTU the "Destination Unreachable fragmentation needed but DF bit set" error is generated.

IPv4 defines a minimum reassembly buffer size: the minimum datagram size that is guaranteed any implementation must support. For IPv4 this is 576 bytes.

TCP has an MSS (maximum segment size) that announces to the peer TCP the maximum amount of data that the peer can send per segment.

The MSS is often set to the interface MTU minus the fixed sizes of the IP and TCP headers.

TCP header is 20 bytes. Maximum amount of TCP data in an IPv4 datagram is 65495

(65535 - 20 byte TCP header and 20 byte IP header).

When an application writes data to a TCP socket

application application buffer - any size

TCP Socket send buffer (SO_SNDBUF)

MSS-sized TCP segments

MSS normally <= MTU - 40 (IPv4)

IP MTU sized IPv4 datagrams

TCP When a write to a TCP buffer is invoked the kernel copies

data into the socket send buffer.• If insufficient room the application is put to sleep. (signal)

UDP: since UDP is unreliable no data is actually copied to a buffer.

The send buffer option in UDP is simply an upper limit on the maximum message size.

UDP prepends its 8 byte header and sends the segment on to IP for conversion to an IP datagram.

Protocol Usage by Common Internet Applications Ping, Traceroute ICMP OSPF IP BOOTP UDP DHCP UDP TFTP UDP SMTP TCP SNMP UDP FTP TCP HTTP TCP

Telnet TCP DNS UDP/TCP NFS UDP/TCP RPC UDP/TCP

Elementary sockets: the API

Socket functions employ pointers to socket address structures as arguments.

IPv4 Socket Address Structure (defined in in.h)

struct in_addr {

in_addr_t s_addr; }

struct sockaddr_in {

unint8_t sin_len; /*length of structure */

sa_family_t sin_family; /* AF_INET */

in_port_t sin_port; /* 16 bit, TCP port number */

struct in_addr sin_addr; /* 32 bit IPv4 address */

char sin_zero(8) /* unused */

}

Socket address structure details.sin_len from 4.3 BSD; Posix 1.g does not require. Many vendors do

not require or support a length field for socket address structures. The data type shown ‘uint8_t’ is typical.

In TCP the network programmer will not use the socket length unless working with routing table code.

Posix 1.g requires only 3 members (sin_family, sin_addr, sin_port). Most Posix implementations add sin_zero members so that the structure is at least 16 bytes.

Datatypes:

in_addr_t is an unsigned int of >= 32 bits

in_port_t unsigned int >= 16 bits

sa_family_t any unsigned int

Note: four socket functions pass a socket address struc from the process to the kernel (bind, connect, sendto, sendmsg) all use sockargs function in BSD implementations.

In these cases the sockarg copies the socket address struc and sets it sin_len appropriately.

The five socket functions that pass a socket address struc from the kernel to the process (accept, recfrom, recvmsg, getpeername, getsockname) all set the sin_len number before returning to the process.

These functions also fill in the structure with the client socket address information.

Byte Ordering Functions little-endian and big-endian.

The memory model is a stack with addresses increasing from top to bottom, i.e., the first member of the stack is 0 the last n. Each entry in the stack is the smallest addressable unit for that particular machine.

Memory address valueincreases from top tobottom in the common memory model.

address 0

0 + 1

n

Byte Ordering (continued).

A number is represented digitally exactly as it is in the decimal notation, i.e. the left-most digits are the highest order. 1024, means (1*103 + 2*101 + 4*100).

Similarly 0400h means 1024 in a 16 bit unsigned int.

Unfortunately the curse of downward compatibility haunts the manufacture of computational machines. Many families of machines were born as 8 bit machines. (Intel).

When Intel went to 16 bit machines they wanted these devices to be able to run the 8 bit code. Marketing rears its ugly head.

Similar situation with IBM and the CURIOS from earlier machines being used as instructions.

Byte Ordering (continued) This ‘downward compatibility’ means that when the manufacturers built

16 bit engines they wanted the code for the 8 bit engines to be capable of running on the new engines.

So they stored the 16 bit data in something called ‘little endian’. The low order byte was stored at the lower address in memory.

This was done so that the 8 bit machine, which processed 16 bit ints 8 bits at a time, could grab the ‘lower order’ bits first. The rules ofarithmetic demand this order of action.

00 04

lowest order address

BYTE ORDERING (continued)

However some machines (Sparc) were conceived as 32 bit machines. Hence they stored their data as it should be stored, with the high order bits at the lower memory address. This is Big-Endian.

The engines designed later in the era did not haveto address upward code compatibility. Hence their storage protocols were based on sensible decisionsand not on marketing pressures.

0400

lowest order address

But we as software engineers are burdened with the distinction of which storage protocol is being used on a particular machine.

Linux, since it is designed to run on Intel, uses little-endian, Solaris, HPUX, designed to work on 32 bit engines use big-endian.

Network programming must specify a network byte order. The sending and receiving protocol stacks must agree on the order for transmission. Internet protocols use BIG-ENDIAN (Posix 1.g).

uint16_t htons(uint16_t) returns a value in Network Byte Ordering.

uint16_t ntohs(uint16_t) returns a value in Host byte order. (32 bit functions are available).

Returning to the socket address structure, the IPv4 and TCP port number are ALWAYS stored in network byte order.

serv.sin_addr refs the IPv4 address as a structure.

serv.sin_addr.s_addr refs IPv4 address as a 32 bit int.

Sin_zero member is unused but it is always set to 0.

Socket address structures are used only on the host; certain values are used for communication but the structures are not communicated. Always pass by reference; use a pointer of the type…...

generic socket address structure.Struct sockaddr {

unit8_t sa_len;

sa_family_t sa_family; // address family

char sa_data(14) // protocol specific addr

Example of using generic socket address structure.

int bind(int, struct sockaddr *, socklen_t)

Therefore any calls to ‘bind’ must cast the pointer to the protocol specific socket address structure to that of a pointer to the generic address structure.

Assume

struct sockaddr_in serv; // therefore

bind(sockfd, (struct sockaddr *) &serv, sizeof(serv) );

Socket address structures are different length. Therefore when passing a pointer to a socket address structure

(argument to a socket function) pass the length as a separate argument.

When passing from kernel to process pass the length as an integer. When passing from process to kernel pass a pointer to the int

location which the kernel can place the length in.

This is because the size is both a value (when function is called) and a result (tells the process how much information the kernel actually stored in the structure).

Called a value-result argument.

Byte manipulation functions. Byte usually an 8-bit quantity. But NOT always. Use term ‘Octet’

for precision.

BSD 4.2; still used by any system supporting socket functions. void bzero (void *dest, size_t nbytes);

• sets specified number of bytes at location to Zero. void bcopy (const void *src, void *dest, size_t, nbytes);

int bcmp (const void *ptr, const void *ptr2, size_t nbytes);

These replace the more familiar memcpy functions of ANSI C.

const modifier prevents any assignments to the object or any other side effects.

??? Example of side effect. A const pointer cannot be modified, though the object to which it

points can be.

char const *str_arb = “Hello World”

char *str3 = & string; // illegal operation

Recommendation: use the memcpy functions instead of the network functions.

Address conversion functions.

Used to convert Internet addresses between ASCII strings and network byte ordered binary values (such as stored in socket address structures).

inet_aton, inet_ntoa, and inet_addr convert an IPv4 address between a dotted decimal string (192.84.247.23) and its 32 bit nbo value.

The newer functions inet_pton and inet_ntop handle both IPv4 and IPv6. Book uses these functions.

inet_aton (const char *strptr, struct in_addr *addrptr); converts a C char string into its 32 bit NBO located at addrptr.

Inet_addr does the same conversion but returns a 32 bit NBO value.• Unfortunately on error this value returns 255.255.255.255 which

means that the IPv4 limited broadcast address cannot be handled by this function (returns same as error). Limited broadcast address is still used.

• inet_addr is deprecated; new code should use inet_aton.

• ?? What does ‘deprecated’ mean????

inet_ntoa converts a 32 bit NBO IPv4 address into its corresponding dotted-decimal string.

•Homework assignment; write a ‘C’ program which can convert from a dotted-decimal to NBO and the inverse.

•Homework read Chapter 3 and 4 of the Steven's Volume One. Be prepared to discuss in detail.

•inet_pton; works with IPv4 and IPv6. The ‘p’ stands for presentation.

int inet_pton (int family, const char *strptr, void *addrptr);

family argument is either AF_INET or AF_INET6. Family must be supported.

Const char *inet_ntop (int family, const void *addrptr, char *strptr, size_t len);

len argument is size of destination (prevent buffer overflow). The following two defs are in in.h

#define INET_ADDRSTRLEN 16 // for IPv4 dotted decimal

#define INET6_ADDRSTRLEN 46 // for IPv6 hex string

If len is too small the null pointer is returned.

On success the argument pointer (*strptr) is returned.

Family either AF_INET or AF_INET6; an integer that is supplied by the kernel (if family is supported).

Summarizing: All of the presented functions do nothing more than conversions between presentation and numeric formats.

Note inet_ntop is protocol dependent. Book provides a protocol independent version called sock_ntop. This function works by examining the structure and calling the appropriate function.

Stream sockets and access. Read/write on stream sockets deliver non-deterministic amounts of

data (window size varying as buffers fill and empty).

Always possible on read, but with write only if the socket is nonblocking.

Therefore use readn or writen when accessing stream socket.

ssize_t readn (int filedes, void *buff, size_t nbytes);

filedes is a file descriptor returned by the socket function

The readn function source

readn(int fd, void *vptr, size_t n)

{

size_t nleft;

char *ptr;

ptr = vptr;

nleft = n;

while (nleft > 0)

{

if ( (nread = read(fd, ptr, nleft) ) < 0)

The writen stream socket function is very similar to that of the readn; looping on a length value.

readline is different in that it calls read for every byte. Inefficient. Better to read as much data as possible then examine the

buffer one byte at a time.

Test to see if a descriptor is really a socket

int isfdtype(int filedescriptor_tested, int file_descrip_type);

To test for a socket file_descrip_type is S_IFSOCK.

Assignment on Chapter 3: Problems 3.1 through 3.3.

TCP Client-Server

TCP Client TCP Server

readline writen

writen readline

fputs

fgets

A full-duplex connection which meansthat reads and writes can take place simultaneously (at least from a virtual perspective). From the datalink layer transmission is truly one directional.

This model is a valid representation of any ‘real’ server that can be imagined. FTP, HTTP, Telnet, all differ only in ‘what’ is done with the data read.

main (int argc, char **argv) // ah yes the ptr to a ptr??

listenfd = Socket(AF_INET, SOCK_STREAM, 0);

….

Servaddr.sin_addr.s_addr = hton1(INADDR_ANY);

INADDR_ANY is the wildcard address; the wildcard address tells the kernel we will accept a connection bound for any local interface (in case system is multihomed).

Which means the one or any choice; server cannot specify a list of multiple addresses. The wildcard is the ‘any’ choice.

Fork: Every process on a Unix system is created by the fork system call (except process 0, the swapper). Process 1 (init) is the ancestor of every other process in the system.

The process which calls the fork is the Parent, while the newly created process is called the Child.

Process may have many children but only one parent.

Kernel loads an exe into memory during an exec call. The loaded process has text, data and the stack. The data has a bss (Block Segment Start) This is used as an

indication of how much mem the kernel should allocate for uninitialized data - 0 at run time).

The stack consists of logical stack frames (pushed and popped on context switch)

The stack is automatically created and is dynamic (kernel monitors)

Fork (continued).

if (fork() = = 0) exelc(“copy”, “copy”, argv[1], argv[2],0); wait((int *) 0);

fork system call creates a new process. The new process gets a return value of 0 from fork and invokes execl to execute the program copy (overlays address space of child process).

If execl succeeds it never returns (executes in a new address space).

Meanwhile parent receives a non-0 return from the call, calls wait, thereby suspending execution until copy finishes.

pid = fork();

On return from a fork system call, the two processes have identical copies of their user-level context except for the return value of pid.

The fork allocates a spot in the process table for the new process assigns a unique PID to the child proces copies the context of the parent process. Increments a file inode table counters for files associated

with the process - The child has all the fd's associated with the parent.

Returns the pid number of the child to the parent process.

To monitor the process status of a Unix machine use the ps command. ps -f will focus on the particular process that you are running.

ASSIGNMENT: DUE SEPTEMBER 6. Write, compile and test a 'c' program that uses a fork to

produce a child process. Have the parent process output the pid of the child. Have the child process output a value that was defined in

the parent. (proof that the child received a complete copy of the parent process space).

GRADUATE STUDENTS: Take this one more level - have the child fork another

process. Have each parent output the pid of the child, have each child output a value that was defined in the parent (grandparent).

The concurrent server modelpid_t pid;int ilstenfd, connfd;

listenfd = Socket(....);Bind(listenfd, .....);

for (;;) {connfd = Accept(listenfd,....)

if ( (pid = fork() ) == 0) { Close(listenfd); /* child closes listening socket */ doit(connfd); /* process the request */

Close(connfd); /* done with this client */ exit(0); /* child terminates */ } Close (connfd); /* parent closes connected socket */ }

Concurrent Server The parent closes the connected socket since the child

handles this new client.

doit does whatever is required to service the client. When doit returns the connected socket in the child is

explicitly closed. The exit will close all open descriptors (so the close is NOT

required. Mostly a matter of style.

Why doesn't the close of connfd cause a FIN to be sent?

To understand this we must understand reference counts.

Reference Counts Every file or socket has a reference count. The reference count is maintained in the file table entry. This is a count of the number of descriptors that are

currently open that refer to the particular file or socket. After socket returns the file table entry associated with

listenfd has a reference count of 1. After accept returns the file table entry associated with

connfd has a reference count of 1. After fork() returns both descriptors are shared (duplicated)

between the parent and the child.

• This means that the file table entries for both have a reference count of 2

Reference Counts This means that when the parent closes connfd the kernel

decrements the reference count from 2 to 1. A real close on the socket does NOT take place until the

reference count is 0.

client

connect()

server

listenfd

connfd

connection

status of client-server after return from accept

Status of client-server after fork returns.

client

connect() server listenfd

connfd

parent

child

listen fd

connfd

connection

Status of client-server after parent and child close appropriate sockets.

client

connect() server listenfd

connfd

parent

child

listen fd

connfd

connection

this is the desired final state of the sockets.The child is handling theconnection with theclient and the parent cancall accept again on thelistening socket to handlethe next client connection.

Campbell,Brice RShah,Chirag Pravin Eidschun,John S Gilley,Michael W Keys,Adam K

Documents

Transcript of Campbell,Brice RShah,Chirag Pravin Eidschun,John S Gilley,Michael W Keys,Adam K