[IEEE 2013 National Conference on Parallel Computing Technologies (PARCOMPTECH) - Bangalore, India...

9
Performance Analysis of Sun RPC Shwetabh Srivastava and Pranay Kumar Srivastava Engineering and R&D Services, HCL Technologies Ltd., Noida, India Abstract - RPC (Remote Procedure Call) is one of the ways for creating distributed client-server based applications. Sun RPC (ONC RPC) is old yet still popular implementation of RPC on UNIX based systems. However Sun RPC implementation suffers from poor performance despite having high speed hardware. In this paper we have given brief about Sun RPC, performance analysis of Sun RPC library and different possible optimization technique that can be applied for enhancing its performance. Index Terms - RPC protocol, Sun RPC, optimization I. INTRODUCTION Remote Procedure Call (RPC) is an interprocess communication that allows a computer program to call or execute the procedure or routine in another address space (generally on remote machine). RPC protocol makes the remote procedure look like a local one. A call to the remote procedure is done transparently on the local machine but the actual computation takes place on a distant machine. Sun RPC implementation is very popular and widely used for RPC applications despite created long back in 1987. The hardware (like NIC, CPU, Memory) performance has been improved significantly over the period of time but our experiments (explained in later sections) shows that Sun RPC performance is not up to the scale in comparison to the used hardware and this could become quite severe issue for RPC based architecture. We have done experiments using Sun RPC library and general TCP socket library to do the performance comparison and analysis of performance in Sun RPC library. Further Sun RPC library is investigated to find out any way to improve the performance without modifying the library itself. Detail description about the Sun RPC library and RPC protocol is not covered in the document. However, before explaining the experiments and analysis conducted, let’s take a brief overview of Sun RPC, tools and steps to create RPC application on UNIX based system like Linux for reference purpose. II. SUN RPC PROTOCOL The Sun RPC protocol was introduced for the implementation of distributed services between heterogeneous machines. It is widely used for the implementation of the distributed operating system design and implementation especially for the distributed filesystem such as NFS and NIS. As the distributed network are often heterogeneous, so in order to maintain the consistency of the data between the different system on the network Sun RPC uses machine independent data representation called External data representation (Sun XDR Protocol). By default, Linux provides the Sun RPC library. A. Architecture of Sun RPC Sun RPC consists of set of library functions and stub generator (RPCGEN Compiler) that follows Sun XDR standard for data representation. Please note Sun RPC is single thread implementation for Linux. 1) Layers in Sun RPC architecture Sun RPC has layered architecture and is implemented as a series of layers. Each layer in the Sun RPC is assigned a devoted task and provides services to other layer. Figure 1 depicts the different layers involved in the RPC implementation [2]. Figure 1: RPC Layer Architecture [2] Network stack: This layer implements the read and writes system call that transfers the data across the network. Stream: Sun RPC maintain the stream buffer for sending and receiving the data between client and server. This layer does the buffer management by

Transcript of [IEEE 2013 National Conference on Parallel Computing Technologies (PARCOMPTECH) - Bangalore, India...

Performance Analysis of Sun RPC

Shwetabh Srivastava and Pranay Kumar Srivastava Engineering and R&D Services,

HCL Technologies Ltd., Noida, India

Abstract - RPC (Remote Procedure Call) is one of the ways for creating distributed client-server based applications. Sun RPC (ONC RPC) is old yet still popular implementation of RPC on UNIX based systems. However Sun RPC implementation suffers from poor performance despite having high speed hardware. In this paper we have given brief about Sun RPC, performance analysis of Sun RPC library and different possible optimization technique that can be applied for enhancing its performance.

Index Terms - RPC protocol, Sun RPC, optimization

I. INTRODUCTION

Remote Procedure Call (RPC) is an interprocess communication that allows a computer program to call or execute the procedure or routine in another address space (generally on remote machine). RPC protocol makes the remote procedure look like a local one. A call to the remote procedure is done transparently on the local machine but the actual computation takes place on a distant machine.

Sun RPC implementation is very popular and widely used for RPC applications despite created long back in 1987. The hardware (like NIC, CPU, Memory) performance has been improved significantly over the period of time but our experiments (explained in later sections) shows that Sun RPC performance is not up to the scale in comparison to the used hardware and this could become quite severe issue for RPC based architecture. We have done experiments using Sun RPC library and general TCP socket library to do the performance comparison and analysis of performance in Sun RPC library. Further Sun RPC library is investigated to find out any way to improve the performance without modifying the library itself. Detail description about the Sun RPC library and RPC protocol is not covered in the document. However, before explaining the experiments and analysis conducted, let’s take a brief overview of Sun RPC, tools and steps to create RPC application on UNIX based system like Linux for reference purpose.

II. SUN RPC PROTOCOL

The Sun RPC protocol was introduced for the implementation of distributed services between heterogeneous machines. It is widely used for the implementation of the distributed operating system design and implementation especially for the distributed filesystem such as NFS and NIS. As the distributed network are often heterogeneous, so in order to maintain the consistency of the data between the different system on the network Sun RPC uses machine independent

data representation called External data representation (Sun XDR Protocol). By default, Linux provides the Sun RPC library.

A. Architecture of Sun RPC

Sun RPC consists of set of library functions and stub generator (RPCGEN Compiler) that follows Sun XDR standard for data representation. Please note Sun RPC is single thread implementation for Linux.

1) Layers in Sun RPC architecture

Sun RPC has layered architecture and is implemented as a series of layers. Each layer in the Sun RPC is assigned a devoted task and provides services to other layer. Figure 1 depicts the different layers involved in the RPC implementation [2].

Figure 1: RPC Layer Architecture [2]

Network stack: This layer implements the read and writes system call that transfers the data across the network.

Stream: Sun RPC maintain the stream buffer for sending and receiving the data between client and server. This layer does the buffer management by

providing the set of functions for reading and writing the user data to and from the stream. This layer hides the details of buffer management and network packet size from higher layer.

XDR: Sun RPC uses the XDR encoding and decoding of the primitive data types for maintaining the consistency of the data. This layer implements the XDR data representation specification and insulates the higher layer from the issue of machine specific data representation. Data transferred between the nodes in the network are translated to XDR format before sending and translated back from the XDR when received.

RPCLIB: This layer implements the RPC protocol, including the client and server connections.

RPCGEN: This layer act as the stub generator and produces the RPC stub based upon the interface definition provided by the user. RPCGEN is a compiler which accepts the remote program interface definition in RPC language and produces the C language output which includes the stub version of the client routines, server skeleton, XDR filter routines and header files which contains the common definition. Both the client and server generated through RPCGEN hides the network implementation details.

2) External Data Representation (XDR) Sun RPC uses XDR standard for maintaining the

consistency of the data across the network. This standard is independent of languages, operating systems and hardware architectures.

XDR uses a language for describing the data formats.

Though this is used only to describe data, it is not a programming language. This language allows programmers to describe different type of primitive data formats in a concise manner. Following are the assumptions for representing any data types in XDR.

A given hardware device should encode the bytes in such a way that other hardware devices may decode the bytes without loss of meaning.

In XDR representation, every data type is represented in the multiple of 4 bytes (or 32 bits) blocks. The bytes are numbered 0 through n-1.

The bytes are read or written to some byte stream such that byte m always precedes byte m+1.

If the n bytes needed to contain the data, are not a multiple of four, then the n bytes are followed by enough (0 to 3) residual zero bytes, r, to make the total byte count a multiple of 4.

III.RPC PROGRAM COMPONENTS AND INTERFACES

Let’s take a look at RPC program components and its interfaces (stubs). We will go through data and message passing mechanism which will later help us to understand the performance bottleneck in RPC. Explaining the RPC

application development process is out of scope of this document.

To develop an RPC application the following steps are needed:

1. Specify the protocol for client server communication: rpcgen protocol compiler is involved in this step which generates the client, server and XDR stubs.

2. Develop the client and server program: They use interfaces provided by RPC and communicate via procedures and data types specified in the Protocol.

Figure 2: Local and Remote Procedure Call [3]

Figure-2 shows the Local and Remote procedure call [3]. RPC mechanisms can be built on top of either connection oriented (TCP) or connectionless (UDP) transport service. In our tests, we are only considering TCP based connection.

A. Client Server Communication in Sun RPC

Following are the summarized operation performed in client server communication through RPC:

Client does the encoding of user data and copies it to the output buffer in the stream layer

Transfer of encoded output buffer on the network Server receives the encoded buffer Decoding of encoded buffer is done by the server Server calls the appropriate service requested by the

client Server sends the encoded result back to the client Decoding of the encoded result is done at the client

side

Figure 3 depicts the client server communication through RPC [4]:

Figure 3: Client Server Communication in RPC [4]

B. XDR Routines

The XDR routines provide interface for translating almost all C data types to their corresponding XDR representation to handle different machine’s representation of data. These interfaces are defined in the header file <rpc/xdr.h>. Corresponding to each built in C data types, a single XDR interface is implemented which does both encoding and decoding of the data. For example below interface depicts the general functionality of the XDR interface.

In the above interface xdrs is an instance of XDR handle to

which or from which the data type is to be converted and argesp is the pointer to the structure to be converted. The XDR structure is defined in <rpc/xdr.h>.

The XDR handle contains an operation field xdr_op and its value can be ENCODE, DECODE or FREE. This xdr_op field indicates which of the operation needs to be performed on the data.

C. Input/Output XDR Stream in Sun RPC

Sun RPC maintains streams for sending and receiving the encoded data across the network.

Sun RPC has different types of stream implementation. The

stream implementation also differs based upon the type of protocol (TCP/UDP) used in the communication. In Sun RPC, UDP uses Memory stream and TCP uses Record stream.

1) Memory Stream Memory streams allow the streaming of data into or out of

a specified area of memory. Currently the UDP/IP implementation of RPC uses memory stream. RPC library uses pointer variable for reading/writing the data and managing the stream. These variables are actually the member variables of the XDR structure defined in rpc/xdr.h. Following are the structure members involved:

x_op : flag for ENCODING or DECODING x_base : base pointer for the start of the buffer x_handy : remaining space in the buffer x_private : current pointer to the buffer Each time when new encoding and decoding is performed

these pointer variables are recalculated for checking the buffer

over flow and managing the stream. This stream buffer is either read or writes to / from the network through the network layer of the Sun RPC. The declaration of the interface for creating memory stream xdrmem_create() is present in the RPC library rpc/rpc.h . Figure 4 depicts the stream management for memory streams [4].

Figure 4: Memory Stream [4]

In our experiments, we are not using memory streams as it was TCP based communication.

2) Record Stream The Record stream interface in Sun RPC library is for a

bidirectional, arbitrarily long sequence of records. A record is composed of one or more record fragments. A record fragment is a 32 bit header followed by n bytes of data, where n is contained in the header. The higher order bit of the header encodes whether or not a fragment is the last fragment of the record (if higher bit =1 fragment is last and if 0 more fragments to follow). The 31 bits encodes the byte length of the fragment.

The stream is primarily used for interfacing Sun RPC to TCP connections. Like memory stream, record stream also maintain some variables for managing the record buffer and checking the buffer data overflow. These are the member variable of structure rec_stream defined in Sun RPC library xdr_rec.c.

Figure 5 depicts the different types of the variable maintained by the RPC library for managing the record stream [4]. Receive size and send size is the size of the record fragment buffer in case of read and write operation respectively. The stream buffer is either read or writes to / from the network through the network layer of the Sun RPC. Implementation details of the record stream interface is present in the Sun RPC library file xdr_rec.c

Figure 5: Record Stream [4]

bool_t xdrproc(xdrs, argresp) XDR *xdrs; <type> *argresp;

IV.PERFORMANCE ANALYSIS OF SUN RPC LIBRARY

A. Environment

TABLE I. SYSTEM CONFIGURATION

Parameter Value

CPU Intel Xeon CPU E5-2690 2.90 GHZ

No. of Processor 32

RAM 96GB

Swap Space 4GB

NIC Chelsio 10G

TABLE II. SOFTWARE CONFIGURATION

Parameter Value

LIBC (RPC Library) Version 2.12

Distribution Red Hat

Kernel Version 2.6.32-279.el6.x86_64

Filesystem ext4

Same environment is used for all the experiments

mentioned in this document.

B. Experiment Details

For checking the performance of the Sun RPC library, we created client server program using Sun RPC library which measures the network throughput (data transfer per second in MB/s1 over the network) in following way:

The test program sends different size of data block between

client and server using Sun RPC library. Test program was executed by providing the start block

size of 4KB and end block size as 64MB. The program calculates the throughput by performing write operation on the server starting from 4KB till the block size becomes 64 MB by incrementing the block size in multiple of 2. After write operation is done, read operation on the same block size is performed. Hence both read and write throughput is calculated by the test program.

The test program was executed on the high speed network (10000Mb/s2) having network card of efficiency 10 Gb/s3.

1) Result: Figure 6 gives the throughput result calculated from the test

program.

1 MB/s = Megabyte / second 2 Mb/s = Megabit / second 3 Gb/s = Gigabit / second

Figure 6: RPC Test Program Performance Results

Considering the speed of the network and the efficiency of the network card (the maximum efficiency of the network card is approximately 1.2 GB/s), the above result calculated from the Sun RPC library are far below the expectation (even assuming only 50 % of the maximum efficiency of the network card).

2) Further Experiment and Result We analyze the RPC library implementation and found that

network layer of the RPC library is using TCP socket (for our analysis we are using TCP protocol for the communication between client and server) for sending and receiving the data between client and server. To investigate how much time is consumed in the RPC layered protocol, we created client server program using simple socket programming which perform similar throughput calculation test using TCP socket (not using RPC routines).

TCP socket test program was run for 60 minute for reading and writing the block size starting from 4 KB to 64 MB in the same manner as done with testing using the Sun RPC test program and found that throughput calculated from the TCP socket are much better than Sun RPC library. Figure 7 shows the throughput calculated using TCP socket.

<Start time> Remote procedure call for sending data <End time> Network throughput = Data size / <End time – start time> <Start time> Remote procedure call for reading data <End time> Network throughput = Data size / <End time – start time>

Figure 7: TCP Socket Test Program Performance Results

C. Analysis and Conclusion

Based upon the above results, we concluded that RPC library must have some performance bottleneck which causes the degradation of throughput calculation as both programs were using TCP sockets. On further investigation and analysis of the Sun RPC library code we observed the following points with respect to performance bottleneck:

Due to layered architecture, there is communication between the different layers of RPC at both client and server side during the RPC calls. This can cause the performance degradation.

Time is consumed in encoding and decoding the request and response to and from XDR format both at client and server side. Moreover due to layered architecture, each call to XDR results in a chain of several procedure calls.

In RPC, data is copied from user buffer into the stream buffer and then passed to the operating system functions. Hence in RPC there are two copies of data per side per RPC call at user level plus at least one copy in the kernel for the kernel driver buffer.

Time may be consumed in managing the XDR stream buffer as every time when a data is encoded to the XDR format, the buffer management variables are recalculated

Time may be consumed (although not significant) in packet initialization as in RPC request and response

header is created for sending data at both the client and server side.

V.PERFORMANCE OPTIMIZATION OF SUN RPC LIBRARY

With the analysis of the Sun RPC library implementation it was found that the major performance bottleneck is due to the XDR encoding and decoding of the data. We tried to investigate further and decided to analyze the Sun RPC library for our test program considering changes in the library code out of scope.

In our test program, we were using variable length character buffer for sending the data across the network. We check the Sun RPC library XDR implementation source code for variable length character buffer. During our source code analysis of Sun RPC XDR implementation, we found that if we use variable length opaque data type for sending the data buffer across the network then XDR encoding and decoding overhead can be bypassed.

A. Analyzing RPC Library using Variable length Character Buffer

We have used variable length character buffer in our test program for sending the different size of the data block between client and server and analyze the different operation performed by the RPC library. Flow chart (fig. 8) depicts the different operation performed by our test program and the RPC library when we use the variable length character buffer for sending the data between client and the server.

char buffer<> (XDR notation of declaring variable length character buffer in .x extension file) is used for the sending variable size data block.

Corresponding to the char buffer<> in the .x file of XDR, RPCGEN generates a structure

where buffer_len is the length of the buffer and the buffer_val points to the buffer.

RPCGEN generate the XDR routine xdr_array for encoding/decoding (if the XDR option is set as encode then function will encode and if the option is set as decode function will decode the buffer) of the character buffer mentioned in the above structure. The elements of the above generated structure will be passed as the argument in the xdr_array routine. Below is the interface detail of xdr_array routine.

bool_t xdr_array (xdrs, addrp, sizep, maxsize, elsize, elproc) XDR *xdrs; caddr_t *addrp; /* array pointer */ u_int *sizep; /* number of elements */ u_int maxsize; /* max numberof elements */ u_int elsize; /* size in bytes of each element */ xdrproc_t elproc; /* xdr routine to handle each element */

struct { u_int buffer_len; char *buffer_val; }buffer

Figure 8: RPC operations with XDR using char buffer

xdr_array first encode the length of the buffer pointed by sizep in 4 byte unsigned integer and copy this length on the RPC output stream data part.

After encoding the length of the buffer, xdr_array routine encodes each character byte pointed by addrp into the 4 bytes XDR representation (this part of performance bottleneck is highlighted in flow diagram fig.7) and copies it on to the RPC output stream data part. Figure 9 depicts the representation of variable length character buffer in output stream using xdr_array routine.

Figure 9: Encoded Data in the variable length character buffer

Each element in the above figure is 4 byte encoded XDR representation of the single byte character element.

Also each time when a single character is encoded to 4 byte XDR representation, the record stream (we are using TCP protocol in our test program) buffer

management variable are recalculated for checking the overflow and updated accordingly (this can be considered as the extra overhead while using XDR).

Similar functionality occurs at receiving or decoding side but in reverse order and same xdr_array function will be used for decoding the buffer value. First the length of the buffer will be read from the RPC input stream and then the value of buffer is read and decoded.

B. Analyzing RPC library using Opaque data type

We tried to bypass the overhead caused due to the XDR encoding /decoding and used the data type called opaque data type.

A variable length opaque data types is defined as a sequence of n arbitrary bytes, numbered 0 through n-1 where n is always multiple 4. Opaque data length n is encoded as 4 byte unsigned integer followed by n bytes of sequence. Byte m of the sequence always precedes byte m+1, and byte 0 of the sequence always follows the sequence length (count). Enough (0 to 3) residual bytes, r, are added to make total bytes count a multiple of 4.

Below diagram depicts the variable length opaque data having 4+n+r bytes.

Figure 10: Encoded Data in the variable length Opaque data type

So rather than defining the variable length character buffer for sending the data block we used variable length opaque buffer and following are the operations done by the RPC library when opaque buffer is used.

opaque buffer<> (XDR notation of declaring variable length character buffer in .x extension file) is used for the sending variable size data block.

Corresponding to the opaque buffer<> in the .x XDR file RPCGEN generates a structure

where buffer_len is the length of the buffer and the buffer_val points to the data.

RPCGEN generates the XDR routine xdr_bytes for encoding the variable length opaque buffer.

xdr_byte routine either encodes/decodes the length (buffer_len in the above structure) of the buffer based on the XDR option. If XDR option is encode, it encodes the length of the buffer into 4 bytes unsigned integer and copy on to the RPC output stream or if the XDR option is set decode then it reads the length of the buffer from the RPC stream and decode it. Below are the interface details of xdr_byte routine.

Struct {u_int buffer_len; char *buffer_val;} buffer

where cpp is pointer to the data buffer and sizep is the length of buffer.

On further analyzing the xdr_byte function of the RPC library we found that xdr_byte function internally calls xdr_putbyte function which simply copies the buffer data pointed by cpp as it is (no modification) to the output stream. Below is the interface detail of xdr_putbyte routine.

Where addr points to the data buffer and len is the length of the buffer.

Unlike using the variable length character buffer (where xdr_array routine encodes every single byte of the character buffer to the 4 byte XDR and copy it to the output stream), in case of the variable length opaque data type only length of the buffer is encoded to the 4 byte XDR representation and the data buffer pointed by the cpp is copied as it is directly to the output stream.

In case of variable length opaque data, variable needed for managing the stream are also not recalculated every time as in case of xdr_array.

Also in case of variable length opaque data type, lesser

number of bytes are transferred across the network (in case of variable length character buffer using xdr_array every single character byte is encoded to the 4 byte XDR representation) as buffer data is copied directly on to the output stream. Figure 11 depicts the various operation performed by RPC library while sending the opaque data type. The difference in Sun RPC library behavior using variable length opaque data and character buffer is highlighted in fig. 11.

Variable length character buffer

opaque buffer<>

struct{ u_int buffer_len; char *buffer_val}

RPCGEN

Element of the structure are filed by test program

Only the length of the buffer is encoded in

xdr_byte

Stream is sent to the network layer of RPC for sending on to the network

Encoding

Value of the buffer is copied to the

stream uisng xdr_putbytes

Figure 11: RPC operations using opaque data type

C. RPC Performance Results using Character Buffer and Opaque data type

We run our test program for sending the data block between client and server using RPC library for both the cases (using variable length character buffer and opaque buffer) and found that performance of the library increases significantly in case of opaque data type buffer. Figure 12 gives throughput results for the different block size using both character buffer (With XDR) and opaque data (Without XDR).

static bool_t xdrrec_putbytes (XDR *xdrs, const char *addr, u_int len)

bool_t xdr_bytes (xdrs, cpp, sizep, maxsize) XDR *xdrs; char **cpp; u_int *sizep; u_int maxsize;

Figure 12: Performance Comparison between Character buffer and Opaque

data type

D. Improving Performance for smaller data size

Throughput calculated for small block size after bypassing the XDR encoding/decoding are still not good and even poorer than what calculated with XDR encoding and decoding. This is according to the behavior of the TCP protocol. TCP protocol collects the small size packets and sends it all at once (Nagle’s algorithm) to avoid the network congestion. So in order to improve network latency and in turn enhance the network throughput for the smaller size block (data), every packet sent should be run on sockets with TCP_NODELAY enabled. Hence we need to control the socket created by the Sun RPC library both at the client as well as server end and set the TCP_NODELAY option on to it.

Following are the changes required on RPC client and server for enabling TCP_NODELAY option:

In our RPC client we were using clnt_create function for establishing the TCP connection between RPC client and server which does not give any control over the socket created by the RPC library.

We check the clnt_create function in the RPC library and found that clnt_create function internally calls clnttcp_create library function if TCP protocol is used.

Now RPC use clnttcp_create library function for creating the TCP connection between client and server. clnttcp_create function gives control over the socket created by the RPC library as this function takes pointer of the created socket descriptor.

clnttcp_create function takes the pointer of socket descriptor as an argument and if the passed socket descriptor value is less than zero then this function creates the new socket descriptor.

RPC client program will pass the socket descriptor *sockp = RPC_ANYSOCK (value of RPC_ANYSOCK is defined as -1 in the library) in the clnttcp_create function. Hence a new socket is created and RPC client program can access the value of newly created socket descriptor through sockp pointer and can set the TCP_NODELAY socket option.

On the server side TCP connection is created through svctcp_create library function. This function also takes the socket descriptor as an argument and if the value of socket descriptor is less than zero, the function creates the new socket descriptor otherwise the function uses the passed socket descriptor for establishing the connection. Below is the prototype of svctcp_create function.

where sockfd is the socket descriptor and sendsz, recvsz are the value of send and receive buffer size respectively.

Earlier sockfd = RPC_ANYSOCK was passed as the argument in the svctcp_create function. As the value of RPC_ANYSOCK = -1 the function creates a new socket descriptor internally and establishes TCP connection. In order to have control over the socket descriptor we have created our own socket descriptor, enable the TCP_NODELAY option on it and passed the value of socket descriptor as sockfd argument in svctcp_create function.

After setting the TCP_NODELAY option on both the client and server socket it was found that throughput for the small size blocks are also increased significantly. The detailed throughput result calculated with TCP_NODELAY option is present in the next section “RPC Performance Comparison”.

E. RPC Performance Comparison

The comparison charts (fig. 13) show the comparative I/O performance result with XDR encoding/decoding and with different optimization options (bypassing encoding/decoding and with TCP_NODELAY) that we have applied on our RPC test program.

The performance result shows significant improvement over default RPC library implementation. This optimization is useful for distributed environment where RPC applications are used between different nodes communication.

svctcp_create(sockfd, sendsz, recvsz)

CLIENT *clnttcp_create (struct sockaddr_in *raddr, u_long prog, u_long vers,int *sockp, u_int sendsz, u_int recvsz)

Figure 13: Performance Comparisons

VI. CONCLUSION

Following are the conclusion that we can derived from the performance analysis of Sun RPC library

RPC has performance bottleneck in XDR encoding and decoding. In our case we have optimized the performance of RPC by bypassing the XDR encoding and decoding.

By bypassing the XDR we are sending the lesser number of bytes over the network as every single character byte was represented as four byte in XDR representation.

Instead of using XDR for enforcing data integrity we can use network to host conversion functions and vice versa before sending (receiving) the required buffer over network using RPC. However while testing we have not used these functions as both the client and server machines were of same endianness.

Also to achieve high I/O performance for small block size we have enabled the TCP_NODELAY socket option. Enabling TCP_NODELAY option may cause network congestion in some cases.

VII. LEGAL STATEMENT

Linux is a registered trademark of Linus Torvalds. Other company, product, and service names may be trademarks or service marks of others.

ACKNOWLEDGMENT

This activity is supported by NEC Corporation, Japan. The authors would like to thank the NEC members for their inputs and support.

REFERENCES

[1] “Remote Procedure Calls (RPC)”, http://www.cs.cf.ac.uk/Dave/C/node33.html

[2] Angelos Bilas and Edward W. Felten, “Fast RPC on the SHRIMP Virtual Memory Mapped Network Interface Princeton University Technical Report TR-512-96”. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.37.9542&rep=rep1&type=pdf

[3] “Remote Procedure Calls (RPC)”, http://dpnm.postech.ac.kr/cs600/tanenbaum/RPC-Intro.ppt

[4] Gilles Muller, Eugen-Nicolae Volanschi and Renaud Marlet, “Scaling up Partial Evaluation for Optimizing the Sun Commercial RPC Protocol”. http://imagine.enpc.fr/~marletr/LaBRI/papers/PEPM97-Muller-et-al.pdf

[5] “XDR: External Data Representation Standard”, http://tools.ietf.org/html/rfc4506

[6] “RPC: Remote Procedure Call Protocol Specification Version 2”, http://tools.ietf.org/html/rfc5531

[7] “Remote procedure call”, http://en.wikipedia.org/wiki/Remote_procedure_call

[8] “ONC Developer’s Guide”, http://docs.oracle.com/cd/E23824_01/pdf/821-1671.pdf