Tutorial guide to Unix sockets for network communications

9
Tutorial guide to Unix sockets for network communications David Coffield and Doug Shepherd outline the use of Unix sockets for networking This paper presents a tutorial on the socket interprocess communication facility provided with 4.2 BSD Unix systems; sockets allow distributed applications to be developed between Unix hosts overa network. The use of the necessary system calls and supporting library routines is explained and illustrated by development of a simple socket application that provides a datagram-based remote command execution facility. Keywords: 4.2 BSD Unix, sockets, local area networks, interprocess communication, client~server model, TCP/IP Interprocess communication (IPC) facilities are essential in the construction of distributed computing applications. The Berkeley 4.2 BSD release of the Unix operating system offers a major strength in the IPC facilities it provides via an abstraction known as 'sockets'. This paper is a tutorial on those facilities based on the authors' own experiences with socket applications running over their Departmental Ethernet local network. INTERNET PROTOCOL FAMILY An implementation of the DARPA Internet Protocol (1 P) family'provides the means for communication between 4.2 BSD Unix systems. As such, some background knowledge on this protocol is useful. The IP was originally developed in 1973 for use over the Arpanet and was designed around the premise that few assumptions can be made about the type of service available from any given network. The result was a simple Department of Computing, University of Lancaster, Baildgg`Lancaster LA1 4YR,UK 0140-3664/87/010021-09 $03.00 © VOI 10 no 1 february 1987 datagram* service protocol that permits gateways con- necting networks to make routing decisions based on globally assigned source and destination addresses and provides hosts with a simple service that can be augmented with higher level protocols, if necessary, to support particular applications. The IP implements two basic functions: addressing and fragmentation. The model of operation is that each host possesses an Intemet module that has common rules for interpreting address fields and for fragmenting/reassembling Intemet datagrams. The protocol handles each datagram as an independent entity unrelated to any other and uses four key mechanisms in providing its service: • type of service • time to live • options • header checksum Figure 1 shows how the IP ties in with 4.2 BSD. (Intemet uses the concept of encapsulation to move packets across each constituent network, of a multinetwork system, permitting the constituent network to function FTP I Telnet UDP TCP IP Underlyingnetwork Figure 1. Intemet protocol family and 4.2 BSD; FTP and Telnet: two of the application level facilities offered on 4.2 BSD; UDP and TCP: upper level protocols implemented; IP: base protocol *a method of transmission where a packet is sent with full addressing information attached but deliveryis not guaranteed -- analogous to a mail letter 1987 Butterworth & Co (Publishers) Ltd 21

Transcript of Tutorial guide to Unix sockets for network communications

Page 1: Tutorial guide to Unix sockets for network communications

Tutorial guide to Unix sockets for network communications

David Coffield and Doug Shepherd outline the use of Unix sockets for networking

This paper presents a tutorial on the socket interprocess communication facility provided with 4.2 BSD Unix systems; sockets allow distributed applications to be developed between Unix hosts overa network. The use of the necessary system calls and supporting library routines is explained and illustrated by development of a simple socket application that provides a datagram-based remote command execution facility.

Keywords: 4.2 BSD Unix, sockets, local area networks, interprocess communication, client~server model, TCP/IP

Interprocess communication (IPC) facilities are essential in the construction of distributed computing applications. The Berkeley 4.2 BSD release of the Unix operating system offers a major strength in the IPC facilities it provides via an abstraction known as 'sockets'. This paper is a tutorial on those facilities based on the authors' own experiences with socket applications running over their Departmental Ethernet local network.

INTERNET PROTOCOL FAMILY

An implementation of the DARPA Internet Protocol (1 P) family'provides the means for communication between 4.2 BSD Unix systems. As such, some background knowledge on this protocol is useful.

The IP was originally developed in 1973 for use over the Arpanet and was designed around the premise that few assumptions can be made about the type of service available from any given network. The result was a simple

Department of Computing, University of Lancaster, Baildgg` Lancaster LA1 4YR, UK

0140-3664/87/010021-09 $03.00 ©

VOI 10 no 1 february 1987

datagram* service protocol that permits gateways con- necting networks to make routing decisions based on globally assigned source and destination addresses and provides hosts with a simple service that can be augmented with higher level protocols, if necessary, to support particular applications.

The IP implements two basic functions: addressing and fragmentation. The model of operation is that each host possesses an Intemet module that has common rules for interpreting address fields and for fragmenting/reassembling Intemet datagrams. The protocol handles each datagram as an independent entity unrelated to any other and uses four key mechanisms in providing its service:

• type of service • time to live • options • header checksum

Figure 1 shows how the IP ties in with 4.2 BSD. (Intemet uses the concept of encapsulation to move packets across each constituent network, of a multinetwork system, permitting the constituent network to function

FTP I Telnet

U DP TCP

IP

Underlying network

Figure 1. Intemet protocol family and 4.2 BSD; FTP and Telnet: two of the application level facilities offered on 4.2 BSD; UDP and TCP: upper level protocols implemented; IP: base protocol

*a method of transmission where a packet is sent with full addressing information attached but delivery is not guaranteed -- analogous to a mail letter

1987 Butterworth & Co (Publishers) Ltd

21

Page 2: Tutorial guide to Unix sockets for network communications

independently of the details of the IP - - similar to Xerox's 'Pup' protocol2.)

• Addressing: There is a distinction between names, addresses and routes. A name is what is sought, an address shows where it is and a route shows howto get to it. IP deals primarily with addresses. Higher level protocols map from names to addresses. The Intemet module maps Intemet addresses to local network addresses. Lower level procedures map from local network addresses to routes. Addresses are a fixed length of 4 byte (32 bit). The first byte is the network number and is followed by a 3 byte local address. For human consumption this is often written in 'a.b.c.d' format.

• Fragmentation: Fragmentation is the division of a large datagram, originating in a network that allows a large packet size, into smaller datagrams when it reaches networks where it exceeds the maximum packet size. This is done by splitting the original packet into the minimum number of new packets and duplicating the original header field in each o n e - the difference being the addition of a sort of 'sequence number' type of field to allow the packets to be reassembled correctly at the other end. Packets may be marked 'don't fragment', but if fragmentation is found to be necessary then they are discarded.

Intemet is a simple datagram protocol. The user interface to the datagram service is made via the User Datagram Protocol (UDP) 3. A virtual circuit service (a method of packet transmission where a route is established and all packets follow the same path thereafter; there are distinct 'call set up' and 'call close down' phases - - analogous to a telephone call) also exists that lies above the IP. This is known as the Transmission Control Protocol (TCP) 4.

IPC AND SOCKETS

Before 4.2 BSD, Unix was weak in the area of IPC faci l i t ies-- the only standard mechanisms were pipes and signals. Pipes are restrictive because the two com- municating processes must reside on the same machine and be related through a common ancestor. (This is not true in System V where the 'named pipe' facility exists.) Also, pipes are unidirectional.

With 4.2 BSD, processes may rendezvous in several ways - - either through a Unix file system-like name space (where names are pathnames) or through a network name space. It is the latter that is of most interest as it allows communication between processes across local area networks (LANs) and wide area networks (WANs).

Many 4.2 BSD commands make use of the socket facilities in their operation. Examples include: ' rwho ' - - which shows all the users on the network and 'ruptime' - - which shows the status of machines on the network. The 4.2 BSD implementation of the Courier Remote Procedure Call also makes use of sockets s.

Basic concepts

The basic building block for communication is the socket 6. A socket is a bidirectional endpoint of communi-

cation to which a name may be bound. Each socket in use has atype and one or more associated processes. Sockets exist within communication domains.

The 4.2 BSD IPC supports two separate communication domains: the Unix domain and the Intemet domain. The Unix domain is used for communication between processes residing on the same machine, whereas the Intemet domain is used for communication between processes across networks. The Intemet domain uses the DARPA standard protocols outlined in the previous section.

Sockets are classified according to the communication properties visible to a user. Processes are presumed to communicate only between sockets of the same type. Five types of socket have, so far, been identified of which three are implemented in 4.2 BSD:

• A stream socket provides bidirectional, reliable, sequenced, and unduplicated flow of data without record boundaries (cf. virtual circuits).

• A datagram socket supports bidirectional flow of data but this is not guaranteed to be sequenced, reliable, or unduplicated. A process receiving messages on a datagram socket may find messages duplicated and possibly in an order different from that in which they were sent. An important characteristic of a datagram socket is that record boundaries in the data are preserved. However, an equally important characteristic is that datagram sockets closely model the facilities found in many contemporary packet switched net- works such as Ethernet. As such, and considering the low error rates in LANs, they are often an adequate means of IPC in such environments.

• A raw socket provides access to the underlying communication protocols which support socket abstractions. These are normally datagram oriented and are not intended for the general user. Rather, they are for those interested in developing new protocols or for gaining low level access to an existing one.

The other two socket types identified are the sequenced packet socket and the reliably delivered message socket. A sequenced packet socket is identical to a stream socket except that record boundaries are preserved. This interface is similar to that provided by the Xerox Network Systems

7 (XNS) Sequenced Packet protocol . The reliably delivered message socket is similar to the datagram socket but with reliable delivery.

The sequenced packet socket will be offered as part of the 4.3 release of BSD Unix.

SYSTEM CALLS AND LIBRARY ROUTINES

There are a number of system calls and library routines that provide the user interface to IPC.

System calls

Since IPC is based on the socket abstraction, the most important system call is socket. To create a socket it is necessary to say:

socket__id = socket(domain, type, protocol);

22 computer communications

Page 3: Tutorial guide to Unix sockets for network communications

The call creates a socket in the specified domain, of the requested type, and using the specified protocol. It returns a small integer number, a descriptor, for future reference to the socket. Two domains are implemented in 4.2 BSD: AF_UNIX (Address Format, Unix domain) and AF__IN ET (Intemet domain). The former is for use between processes resident on the same machine and the latter for use between processes over networks (or on the same machine). Example calls are:

socket~id = socket(ALINET, SOCLDGRAM, (3);

enabling datagrams (User Datagram Protocol (UDP)) to be used within the Internet domain. The third parameter of the call, a zero, means that the system will select the protocol most appropriate to the user's chosen domain.

sockeL_id = socket(ALINET, SOCLSTREAM, pp- > p._proto);

meaning use stream sockets within the Intemet domain. The 'pp- > p_pro to ' refers to a field of a structure that contains the protocol type. The structure may be filled in, before the socket call, using the 'getprotobyname' libran/ routine

struct protoent *pp; t pp = getprotobyname ("tcp");

Using this routine makes programs more readab le - rather than just using the numeric identifier for a protocol.

Having created a socket, a name has to be affixed to it. Until this is done, processes cannot refer to a socket. A name is bound as follows:

bind(socket~id, name, namelen);

Example calls:

bind(socket__id, "/dev/foo", sizeof("/dev/foo") - 1);

demonstrates a Unix domain call and

bind(socket~id, &server, sizeof(server));

demonstrates an Internet domain call. In the Unix domain 'name' is a Unix pathname whereas for the Intemet domain 'server' is a structure of type 'sockaddr in' - - one of the commonly used socket related s t ruc tures- the contents of which must be completed within the user program. Figure 2 shows the component fields of the structure.

The bind call can fail if the port number is wrongly chosen. Port numbers should be greater than 1023 as

Figure 2.

Domain

Port

Address

Aliases

"$ockaddr__in' structure

t The authors do not, of course, usually put declarations immediately before system calls or library routines. This is done purely to emphasize the structure type.

numbers less than this are reserved for privileged processes (processes that are either owned by the supenJser or have superuser status). The programming example shows how the user may select a correct port number.

There are two models of communication: connection oriented (TCP virtual circuit type) and connectionless (UDP datagram). The client-server relationship provides the typical scenario where two processes make use of IPC facilities. One process, the client, requires a service provided by the other, the server. The client calls the server with the request, the server carries out the requested task and returns the result to the client.

After the socket and bind calls, the next step depends on whether a connection oriented or connectionless environment is required. (Note that there is no reason for a server not to provide both virtual circuit and datagram interfaces to the service it provides. This would require differing port numbers for each.) The connection oriented, virtual circuit, method is considered first.

Connection oriented environment

A client process requests services from the server by initiating a connection to the server's socket. The server process must listen, for incoming results, on its socket when it is ready to provide its services.

The first thing the server does is execute a listen call:

l isten(server~socket~id, 5);

meaning listen on the socket identified by 'server_ socket~id ' and allow five connections to be queued for processing (any connection requests after five in the queue are simply ignored). Following the listen the server waits in an 'accept' call for an incoming connection:

struct sockaddr__in from; fromlen = sizeof(from); newsd = accept(server__socket__id, &from, &fromlen);

The 'from' structure is completed with the client's details on an incoming call. The accept call returns a new socket descriptor so that the client and server are 'spliced' together leaving the original socket descriptor free to look for other clients. (In many applications the use of accept is followed by a 'fork' - - the child process carrying on with the new descriptor.)

A client attempts a connection, in the Intemet domain, by:

struct sockaddr._in server; connect(client~socket__id, &server, sizeof(server));

The server's details are filled in, by a mixture of l ibra~ routines and assignations, before the connect call.

With the client and server connected, data transfer can take place. The standard 'read' and 'write' calls may be used here but the new calls 'send' and 'recv' should be used in preference - - if only to highlight that sockets are being used:

send(socket~id, buf, sizeof(bu0, flags); recv(socket~id, buf, sizeof(buf), flags);

vol 10 no 1 februan 1 1987 23

Page 4: Tutorial guide to Unix sockets for network communications

'buf' is a character array containing the data to be sent; 'flags' allows several options: for sending, it allows out-of- band data - - a concept that will not be discussed here; for receiving, it allows the ability to look at data before it is possible to actually read it.

Once a socket is no longer needed, it should be discarded by the 'close' call:

close(socket_id);

An attempt will still be made to receive pending data, even after a'close'. If the user has no use for such data the socket may be forcibly closed by 'shutdown' (this is not the same as the system maintenance command shutdown w the clash of names is unfortunate).

The same server processes on each machine have the same port numbers - -on ly the machine address is different. A client on one machine that wants to make use of a service provided on another machine can find out the necessary port/protocol information by examination of the '/etc/services' file on his machine by using various library routines.

To summarize the connection oriented method an illustration of the calls the client and server execute, with respect to time, is shown in Figure 3.

Connectionless sockets

These have no 'call set up' and 'close down' phases. Instead, all packets include the destination address. The sockets are created and bound as before. (Connect, listen and accept are not used here.) To transfer data the 'sendto' and 'recvfrom' primitives are used.

sendto(socket~id, bur, buflen, flags, &to, tolen);

sends the data in 'bur' to the socket whose details are contained in the sockaddr-in structure 'to'.

recvfrom(socket~id, buf, buflen, flags, &from, &fromlen);

places the received data in 'buf' and fills details of where it came from into the 'from' structure (recvfrom blocks on calling).

Cl ien t Server

Socke t Socke t I t I I

B ind B ind I I I I I L is ten

Connec t I I

A c c e p t

Send recv

Close

Figure ~. cation

Close

(new id) T i m e -t

I I I t I

Recv, send

I I I

Close

Time diagram of connection based communi-

In developing socket based applications it often becomes necessary to have a process that is listening on a socket and looking for input from the user's terminal, or another file descriptor, at the same time. This may be achieved by the 'select' system call. Details of these calls may be found in Section 2 of the Unix programmer's manual.

Library routines

Complementing the system calls are a series of network library routines that exist primarily to allow the mani- pulation of network addresses. Locating a service on a remote host requires several levels of mapping before the client and server may communicate. The service name and remote host must be translated into a network address. The address must then be used to find a physical location and a route to the service (this is not really applicable in the LAN environment).

Routines are provided for the following:

• mapping host names to network numbers; • network names to network numbers; • protocol names to protocol numbers; • service names to port numbers; • the protocol to use in communication.

In 4.2 BSD a file, '/etc/hosts', contains Intemet addresses in the 'a.b.c.d' format alongside the corresponding host names, and any aliases.

The routines 'gethostbyname' 'gethostbyaddr' and 'gethostent' retum a'hostent' structure - - an intermediate data structure that may be used for filling in the sockaddr_.in structures. Similarly, routines exist for mapping network names to numbers, and vice versa. These routines return a 'netent' structure. They are 'getnetbyname', 'getnetbynumber' and 'getnetent'. For protocols a 'protoent' structure is completed by 'get- protobyname', 'getprotobynumber' and 'getprotoent' routines.

Service names are a little more complicated. Services must exist on well known ports. In other words, a service must have the same port number on each machine. A file, akin to '/etc/hosts' exists, called '/etc/services', where the service < - > port number relationships are retained. 'Servent' structures are returned by the routines 'getserv- byname', 'getservbyport', and 'getservent'.

The component fields of the 'hostent', 'netent', 'protoent' and 'servent' are shown later, in Figure 4. Details of these library routines may be found in Section 3N of the Unix programmer's manual.

Miscellaneous

By using the routines mentioned in the previous section application programs should rarely have to deal directly with addresses as such, thus decreasing the network dependency a little. However, there are other routines provided to tackle some other problems such as byte swappin~ The routines in the previous section return

24 computer communications

Page 5: Tutorial guide to Unix sockets for network communications

addresses in what is known as network byte order. On a VAX this is reversed and programs are required to byte swap quantities. This is especially true if the programmer wants to print out an address, perhaps in the debugging of a program. These routines are 'htonl', 'htons', 'ntohl' and 'ntohs' (host to network/network to host long/short). These routines may be found under byteorder(3N)).

R E M O T E C O M M A N D E X E C U T I O N FACIL ITY

Sufficient background knowledge has now been covered to write programs using sockets. This section illustrates their use by stepping through the design and coding of a simple datagram-based remote command execution facility. This example was coded and run over the authors' Departmental Ethemet on a selection of VAXes and SUNs, largely running 4.2 BSD Unix.

The datagram (UDP) interface is chosen simply because most of the IPC-based commands, such as 'rlogin', make use of the virtual circuit method and therefore there are already examples of stream based sockets. Two programs will be developed: 'rex.c' and 'rexd.c'. 'rex' is what the user will type to initiate a remote command and 'rexd' is the sewer process on the remote machine(s) that will wait for, and execute, commands. To use 'rex' the user will type:

rex host command

The programs are fairly alike and as straightforward as possible. They do not handle interactive commands, and hence are similar to their 4.2 BSD counterparts, so the command set that can be issued is somewhat limited. However, they do provide a good first illustration of programming with sockets. The process starts with 'rexd.c' (the sewer process) and then describes the alterations necessary to provide 'rex.c'.

struct protoent *pp; struct hostent *sewer; struct sockaddLin sewer__sock, client__sock; struct in addr sewer addr, client~addr;

The sockaddr_in structure was described earlier. Figure 4 shows the fields of the other structures. The 'main' function contains some more declarations and then begins properly:

if (gethostname(hostname, 40) ! = O) { fprintf(stderr, '1 don't know who I am! \n"); exit(l);

/

fills in 'hostname' with the machine's name. Equipped with the host's name, it is possible to

discover the address by:

if ((sewer = gethostbyname(hostname)) = = O) { fprintf(stderr, "source host %s unknown\n", hostname); exit(1 );

}

The next line:

bcopy(server- > addr, (char *) &sewer addr, server- > h_length);

may be redundant in this example but in programs where successive calls to 'gethostbyname' are made it is vital. 'Gethostbyname' returns information to the same memory area on each call. Therefore, information derived from the previous call is overwritten --hence the importance of copying the machine's address. The process now knows all about the host on which it is running - - name, address etc.

pp = getprotobyname("udp");

chooses the UDP from the '/etc/protocols' file. It simply returns a numerical value that is equated with 'udp' but its use helps readability.

Server process-rexd

'rexd' first uses system calls to identify, for itself, the host it is running on. The program begins by '#include'ing some typical header files along with:

• sys/socket.h- definitions related to sockets: types, address families and options

• netinet/in.h - - constants and structures defined bythe Intemet system

• arpa/inet.h - - external definitions for Intemet structures • netdb.h - - network structures

We then '#define' a port number which is called SERVER__PORT. This has been given the arbitrary number 2001 and it is well known by all the other machines in that it has been entered in their '/etc/services' files. Port numbers can generally be arbitrary but should be kept greater than 1023 - - as anything less than that is reserved for processes with privileged status. Next, some declarations:

Protocol name

Aliases

Protocol number

a

Host name

Aliases

Address type

Address length

Address

b

Network name

Aliases

Address type

Network number

d

Service name

Aliases

Port number

Protocol to use

C

Figure 4. Fields of the other common socket structures; (a) "protoent;" (b) "hostent;" (c) "servent:" (cl) "netent"

vol 10 no 1 february 1987 25

Page 6: Tutorial guide to Unix sockets for network communications

Next the socket through which communicat ion wil l take place is created. How this was carried out was seen eadier.

if ( (server_id = socket(AF_lNET, S O C L D G R A M , pp- > p__proto)) = = - 1 ) { perror("socket( )"); exit( l ) ;

} printf("server socket created\n");

The socket has 'server_id ' as its descriptor. Now further socket details may be entered.

server_sock.sin_family = AF__INET; /*it's an Internet socket */

server_.sock.sin._addr = server_addr; /* assign the address we preserved earlier */

server__sock.sin__port = htons((u__short)SERVELPORT); /* assign the port number. Note the use of */ /* htons - - "host to network short" */

With the structure fields completed, the socket may be bound.

if (bind(server__id, &server~sock, sizeof(server~sock) )= = - 1 ) ( perroK"bindO"); close(server_id); /* throw away descriptor */ exi t( l ) ;

}

Before the main processing loop two tasks are carried out:

• fork - - to detach the process from its parent • disassociate the process from the terminal on which it

was initiated:

int t; whi le ((t = fork()) = = - 1 )

sleep(I); if (t)

exit(O);

for ( t = 0; t < = 20; + + t ) if (t ! = servernid)

close(t); if ((t = open("/dev/tty", 2)) > = O) {

ioctl(t, TIOCNOTTY, (char *) 0); close(t));

}

Having created the socket, placed the process in the background, and voided the terminal association, the server process now enters an interface loop. The loop has to receive commands from clients, execute the request, and return the output to the client:

if (recvfrom(server~id, command,sizeof(command),O, &client__sock, &length) < O) {

perror("recvfrom( )"); close(server_id); exi t( l ) ;

} psystem(command, &infd, &outfd, &errfd, 0); whi le ( (length = read(outfd, data, 512) ) > 0)

if (sendto(server__id, data, length, O, &client__sock, sizeof(cl ient~soclO ) < O) {

perror( 'sendto( )"); close(servernid); exi t( l ) ;

}

awaits the incoming 'command' from 'client__sock' and passes the request to 'psystem'. 'psystem' is an inhouse routine that creates a suitable number of pipes, forks a new process, arranges that the child process redirects the new pipes to be its 'stdin', 'stdout' and 'stderr', and then 'exeds' the new process - - which executes the command. The actual implementat ion of 'psystem' is not, however, relevant to this example. After the command has been executed, the output is fol lowed by sending 'END' and closing the descriptors.

Cl ien t p r o c e s s - rex

'rex' is similar to 'rexd' so the entire program wil l not be covered in detail. The program has to carry out three tasks:

• obtain the user command and find out where to send it i.e. the location of the server;

• send the request; • await the reply. The client has to know when the server has completed its task i.e. when all the server's ou tput has been received. The authors have chosen to append the sequence 'END' to the output from the server, to indicate this to the client. Using 'END' is clumsy but has the advantage of being both straightforward and obvious.

command

client 'rex' 'rexd' server < output + 'END'

Socket creation is as before. However, the client is an application that exists for a finite t ime and does not have a well known, fixed port number. It has to determine the port number to use. One method is to start at 1024 and continual ly bind the socket until no error is reported. When that happens there is a bound socket wi th a valid port number only usable by the client that chose it, for the l ifetime of the client. The fol lowing code fragment is responsible for this:

for (client__port = IPPORT__RESERVED; client__port < = 32767; + +client__port) {

dient__sock.sin__port = htons( (u~short)client~port); if (bind(dient_id, &clienL_sock, sizeof(client__sock) ) > = O)

break;/* found a valid port number, exit loop */ ,if (ermo = = EADDRINUSE II e~no = = EADDRNOTAVAIL) continue;/* tn/another one */

perror("bind( )% close(client~id); exit(1 );

I

26 computer communicat ions

Page 7: Tutorial guide to Unix sockets for network communications

The server to be contacted is, of course, on the machine provided as the host when the command was issued. It is possible to fill in all the details of the server as before (gethostbyname etc.) except for the port number. This is fetched from '/etc/services' as follows:

if ( (ss = getservbyname("rexd", "udp") ) = = 0) { perror(";getservbyname( )"); close(client~id); exit(l);

} and we can then fill in 'server~sock' in the usual manner:

server~sock.sin__family = A L l NET; server__sock.sin_addr = server addr; server_sock.sin_port = ss- > s~port ;

The command the user requested for execution is then sent to the appropriate server process:

if (sendto(client~id, command, sizeof(command), 0, &server__sock, sizeof(server~sock)) < 0) {

perror("sendto( )"); close(client~id); exit(1 );

t and the client enters an 'infinite' loop, receiving data from the server, until the pattern 'END' (signifying server completion) is read. The loop to do this looks like:

length = sizeof(server_sock); for ( ; ;) { /* ever */

if ( (bytesrecvd = recvfrom(client~id, data, sizeof(data), 0, &server_sock, &length) ) < 0) {

perror("recvfrom( )"); close(client__id); exit(l);

} if (stmcmp(data, "END", 3) = = 0 &&

Bytesrecvd = = 3) break; write(l, data, length);

}

That completes the discussion of the example. Listings of the two programs are given in the Appendices.

CONCLUDING COMMENTS

The socket abstraction is provided only with 4.2 BSD versions of Unix, and upward. The problem of communi- cation between processes on 4.2 and non4.2 systems should be considered. One possible solution is emulating sockets on non4.2 systems and installing the necessary Intemet protocol software; indeed, this approach is being adopted by AT&T who are writing a sockets emulation library for System V Release 3.0 over'streams' - - their own IPC facility °. As the Internet protocols are widely used, implementations of them exist for other operating systems, such as Digital Equipment Corporation's VMS.

Sockets are a significant improvement on previous IPC facilities, which were generally confined to pipes and signals. Earlier versions of Unix have provided other forms

of IPC. Version 7, for example, carried an experimental IPC facility, known as 'mpx' files. System V provides IPC in the form of streams but these are not as widely available as sockets (sockets have become established), and are believed to be weak, as yet, regarding communication between processes over networks.

Many problems can be encountered in the pro- gramming of socket-based applications, most involving socket addresses - - either wrong choice of address/port, or accidental reuse of an address/port or address/port pair. Byte swapping is also a nuisance. Library routines with names beginning inet*, have been prone to failure for no apparent reason (inet__addr is a good example). Some of these are due to conflicting structure definitions in the header files. Problems such as this are clearly bugs in the 4.2 BSD implementation of sockets and hopefully have been rectified in the 4.3 release.

Finally, it must be pointed out that no security checks are made between the client and server during a transaction in the programming example. For instance, 'rexd' running as a privileged process allows the client to do almost anything on the remote machine. This problem is easily rectified but would have made the example more difficult to understand and, as the main aim was a discussion of the use of sockets, this has been omitted. Security is something that should be seriously considered in 'real' applications.

ACKNOWLEDGEMENTS

Thanks are due to Stephen Muir for 'psystem' and David Hutchison and John Gallagher for their comments on an initial draft of this paper.

REFERENCES

1 Cerf, V and Cain, E 'The DOD Intemet architecture model' CompuL Network. Vol 7 (October 1983) pp 307-318

2 Boggs, D R et al. 'Pup: an Intemetwork architecture' IEEE Trans. Comm. Vol 4 (April 1980) pp 612-624

3 Postel, J 'User datagram protocol' RFC 768 USC/ Information Sciences Institute (August 1980)

4 Postel, J 'DOD standard transmission control protocol' CompuL Commun. Rev. Vol 10 No 4 (October 1980) pp 52-132

5 'Courier: The Remote Procedure Call Protocol' Xerox System Integration Standard S 038112 Xerox Cor- poration, Stamford, USA (December 1981)

6 Leffler, S Jet al. A 4.2BSD interprocess communication primer Computer Systems Research Group, Depart- ment of Electrical Engineering and Computer Science, University of California, USA (1983)

7 Dalai, Y K 'Use of multiple networks in the Xerox network system' Computer (October 1982) pp 82-92

8 Ritchie, D M 'A stream input-output system' AT&T Bell Lab. Tech. J. Vol 63 No 8(2) (October 1984) pp 1897-1910

vol 10 no 1 february 1987 27

Page 8: Tutorial guide to Unix sockets for network communications

A P P E N D I X A - - " R E X D . C "

/ , * The server code: waits for incoming requests, "executes the command and retums the results to the client. . /

#include {stdio.h} #include {strings.h} #include {sys/ermo.h} #include {sys/ioctl.h) #include ~sys/types.h} #include {sys/socket.h) #include ~netinetlin.h) #include ~arpa/inet.h} #include ~netdb.h} #define SERVELPORT struct protoent struct hostent int server__#d; struct protoent struct hostent struct sockaddLin struct in_addr main( ) {

2001 "getprotobyname( ); *gethostent(), *gethostbyname( );

*pp; *server; server~sock, client~sock; server__addr;

char hostname[40]; char command[BUFSIZ], data[BUFSIZ];/* message and data buffers "/ int length, ouffd, infd, enfd; if (gethostnarne(hostname, 40) ! = 0) {

fprintf(stderr, "1 don't know who I am! \n"); exit(l);

) if ( (server = gethostbyname(hostname) ) = = 0) {

fpnntffstderr, "source host %s unknown\n", hostname); exit(l);

} bcopy(server- > h_addr, (char ")&server addr, server- > h_length);

#ifdef DEBUG printf('\nserver: \n'); printf("name %s \n", server- > h_name); printK"address type %d In", server- > h_addrtype); pdntf("length of address %d \n", server- > h_length); " prinff("lntemet address %s \n~n", inet~ntoa(server addr) );

#endif if ((pp = getprotobyname('udp")) = = 0) i / * choose udp protocol */

fpdntf(stderr, "protocol not found: check I"/etc/protocolsl"\n'); exit(l);

I /* create an Internet socket on the host machine "/ if ((server_id = socket(AF_lNET, SOCLDGRAM, pp- > p_proto)) = = -1) t

perror("socket( )"); exit(1 );

I #ifdef DEBUG

pdntf("server socket created~n"); #endif

/" fill in the server machines socket details */ server__sock.sin_family = AF_INET; serveL_soclcsirt_addr = server_addr; server__sock.sin__port = htons( (u~hort)SERVELPORT); /" now bind the socket "/ if (bind(server__id, &server__sock, sizeof(serveffserver__socl0 ) = = -1) I

perror('bind( )"); close(server_id); exit(l);

) #ifdef DEBUG

printf('server port # °/~l~n", ntohs(server~ck.sin__port) ); #else/*it's debugged, so run it in the background */

[ int t; while ((t = fork()) = = -1)

sleep(I); if (t)

exit(0); /* void process terminal association - - • making sure all open files associated with the try • are closed with the exception of the socket itself . /

for(t = 0; t < = 20; ++t ) if (t ! = server__#d)

close(t); if ( (t = open("/dev/tty", 2) ) > = 0 {

ioctl(t, TIOCNOI-rY, (char ") 0); close(t);

I I

i~ndif for (; ;) { / " ever */

#ifdef DEBUG pdntf("listening on (add~ss, port): %s, %d\n",

inet_ntoa(server._sock.sin__addr), htons(server._sock.sin~port)

); #endif

length = sizeof(client.~soclO; if (mcvfrom(server__id, command, sizeof(command), 0,

&client~sock, &length) < O) { perror("recvfrom( )"); exit(1 );

) #ifdef DEBUG

printf("incoming command from (address, port): \n"); printf("%s ", inet__ntoa(client-sodcsin_addr) ); printf('%cl/n", ntohs(client__sock.sin_por0 ); printf("command = \"%s/"\n", command);

#endif psystem(command, &infd, &outfd, &errfd, 0); while ( (length = read(outfd, data, 512) ) > O)

if (sendto(server_id, data, length, 0, &client_sock, sizeoffcllent~sock) ) < O) {

perror("sendto( )'3; close(server_#d); exit(l);

} wait(O); close(infd); close(errfd); close(outfd); strcpy(data, "END'3; if (sendto(server_id, data, strlen(data), 0, &client _sock,

sizeof(client~sock) ) < O) { perroff"sendto( )"); close(server_#d); exit(l);

I

A P P E N D I X B - " R E X . C "

/* * Code that sends a command to another host for execution. , /

#include (stdio.h> #include (strings.h} #include {sys/errno.h} #include {sys/types.h} #include {sys/socket.h} #include <~netinet/in.h} #include ~arpa/inet.h} #include {netdb.h~ extern int errnoi stmct protoent struct hostent int stmct protoent stmct servent struct hostent struct sockaddr_in struct in_addr main(argc, argv) int argc; char *argv[ ];

#ifdef

*getprotobyname( ); "gethostent(), *gethostbyname( ); client_.id, client_.port; *pp; *SS; *client, *server; client_sock, server_sock; client._.addr, server_addr;

char hostname(40]; char command[BUFSlZ], data(BUFSIZ];/* command and data buffers "/ int length, bytesrecvd; if (argc < 3) {

fprintKstderr, "usage: rex hostname command/n'); exit(1 );

I if (gethostname(hostname, 40) ! = 0) {

fprintf(stderr, "'1 don't know who I am!In"); exit(t );

) if ((client = gethostbyname(hostname) ) = = O) [

fprint/(stderr, "source host %s unknown/n", hostname); exit(l);

) bcopy(client- > h_addr, (char *)&client_addr, client- >h length);

DEBUG printf("\nclient: \n'~; printf("name %s~n", client- > h_name); prinff("address type =/~n", client- > h_length); printf("length of address %~n", client- > h__length); prinff("lntemet address %s~n\n", ineL_ntoa(client_addr) );

2 8 c o m p u t e r c o m m u n i c a t i o n s

Page 9: Tutorial guide to Unix sockets for network communications

#endi f if ( (pp = getprotobyname("udp'3 ) = = O) { / * choose udp protocol */

fprintffstderr, "protocol not found: check \"/etc/protocolsV~n"); exit(1 );

) /* create an Intemet socket on the host machine */ if ( (c l ient ._ id = socket(AF_lNET, SOCK__DGRAM, pp- > p_ .p ro to ) ) = = - 1 ) [

perror("socket( )"); exit(1 );

) #ifdef DEBUG

printf("cl ient socket created/n"); #endi f

/* fill in the socket details for the client */ client__sock.sin__family = AF_INET; c l i e n U o c l c s i r t _ a d d r = client__addr, /* now f ind a suitable free port number and bind the socket */ for (c l ien t_por t = IPPORT__RESERVED; c l ien t_por t < = 32767;

+ +client,__port) [ c l ien t_sock .s in_por t = htons( (u__shor0client.~porO; if (bind(cl ienL_id, &c l ient_sock, s izeoffc l ient~sock) ) > = 0)

break; /* we 've found a valid port number */ if (ermo = = EADDRINUSE II ermo = = EADDRNOTAVAIL)

continue; /* try another one */ perror( 'b ind( )"); close(client__id); exit(1);

l #ifdef DEBUG

pr int f f 'c l ient port # %dln", ntohs(c l ient~sock.s in~port ) ); #endi f

/* client socket is now set up for use */ strcpy(hostname, argv[1] ); strcpy(command, argv[2] ); if ( (senrer = gethostbyname(hostname) ) = = 0) {

fprintf(stderr, "server host ~"%s\"unknownln", hostname); close(cl ient_id); exit( l ) ;

) bcopy(server- > h addr, (char ")&server_addr, server- > h_length) ; /* fill in the server sockets details */ /* f ind out the port number by looking up "/etc/services" "/ if ( (ss = getservbyname("rexd", " udp ' ) ) = = O) {

perror("getservbyname( )'3; dose(d ien t~ id ) ; exit(t);

} server_sock.s in_fami ly = A L I N E T ; se rve r~sock .s inadd r = server_addr; server~sock.s in_por t = ss- > s_por t ;

#i fdef DEBUG pr int f ( "command V'%sV' ", command); printf("for execut ion on host V'%s/" (address, port) %s, %d\n",

hostname, ineL_ntoa(server~ock.s in_addr ) , ntohs(server._sock.sin__por0 );

#endif if (sendto(c l ient~id, command, sizeof(command), 0, &server_sock,

sizeof(server_sock) ) < 0) ] perror("sendto( )"); close(cl ient_id); exit(1 );

} length = s izeof(serveLsock); for (; ;) {

if ( (bytesrecvd = recvfrom(cl ient~id, data, sizeof(data), 0, &serve r~ock , &length) ) < 0) {

perroff"recvfrom"); close(cl ient_id); exit(t );

) if (stmcmp(data, "END", 3) = = 0 && bytesrecvd = = 3)

break; wr i te( l , data, length); ) close(cl ient_id); exit(0);

vol 10 no 1 february 1987 29