Cloud Computing (Draft for Review)

103
Evaluation Towards Cloud : Overview of Next Generation Computing Architecture by Monowar Hasan & Sabbir Ahmed A Thesis submitted to the Department of Computer Science and Engineering in partial fulfillment of the requirements for the degree of Bachelor of Science (B.Sc.) in Department of Computer Science and Engineering Bangladesh University of Engineering and Technology 22 March 2012 Dhaka Bangladesh

Transcript of Cloud Computing (Draft for Review)

Evaluation Towards Cloud : Overview of Next Generation Computing Architectureby Monowar Hasan & Sabbir Ahmed A Thesis submitted to the Department of Computer Science and Engineering in partial fulllment of the requirements for the degree of

Bachelor of Science (B.Sc.) in Department of Computer Science and Engineering

Bangladesh University of Engineering and Technology 22 March 2012 Dhaka Bangladesh

AbstractNowadays Cloud Computing become a buzz-word in distributed processing. Cloud Computing, originates from the ideas of concurrent processing from Computer Cluster, enhanced the established architecture and standards of Grid - another technology of parallel processing with the ideas of Utility Computing and Service oriented Computing. Cloud Computing is actually provide a business model as form of X-as-a-Service where X may include Hardware, Software, Developing platform or some Storage media. End-users can consume any of these service form provides pay-as-you-go basis without knowing details about underlying architecture. Hence, cloud provide a layers of abstraction to end-users and provides scope to modify the application demand for end-users, developers and providers.

ii

AcknowledgementsWe are grateful to several people for this thesis without whom it wont be a successful one. Our heart felt thanks to our supervisor Professor Dr. Md. Humayun Kabir Sir for his support and valuable guidelines. With his continuous feedback and assistance helps us to clear our ideas and understandings on the topics.

Special thanks to Professor Dr. Hanan Lutyya from University of Western Ontario, Canada and Professor Dr. Ivona Brandic, Vienna University of Technology, Vienna, Austria for proving their research publications which helps to progress our thesis.

Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology provided us with sound working environment and helps us to get on-line publications.

Last but not the least, we acknowledge the contribution and support of our family members for being with us and encouraging us all the way. Without their sacrice it would not end up a successful one.

iii

Table of Contents

Abstract Acknowledgments Table of Contents List of Tables List of Figures 1 Computing with Distributes Units: Computer Clusters 1.1 Distributed Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.1 1.1.2 1.1.3 1.2 1.3 1.4 1.5 Centralized vs Distributed Systems . . . . . . . . . . . . . . . Advantages of Distributed Systems . . . . . . . . . . . . . . . Issues and Challanges in Distributed Systems . . . . . . . . .

ii iii vii viii x 1 4 5 6 7 7 8 10 12 13 14

Computer Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . Architecture of Computer Clusters . . . . . . . . . . . . . . . . . . . Cluster Interconnection . . . . . . . . . . . . . . . . . . . . . . . . . . Protocols for Cluster Communication . . . . . . . . . . . . . . . . . . 1.5.1 1.5.2 Internet Protocols . . . . . . . . . . . . . . . . . . . . . . . . . Low-latency Protocols . . . . . . . . . . . . . . . . . . . . . .

iv

1.5.2.1 1.5.2.2 1.5.2.3 1.5.2.4 1.5.2.5 1.5.3

Active Messages . . . . . . . . . . . . . . . . . . . . Fast Messages . . . . . . . . . . . . . . . . . . . . . . VMMC . . . . . . . . . . . . . . . . . . . . . . . . . U-net . . . . . . . . . . . . . . . . . . . . . . . . . . BIP . . . . . . . . . . . . . . . . . . . . . . . . . . .

14 15 16 16 17 17 18 19 22 23 24 24 25 26 27 28 29 30 31 33 34 35 35 36 37 38

Standards for Cluster Communication . . . . . . . . . . . . . 1.5.3.1 1.5.3.2 VIA . . . . . . . . . . . . . . . . . . . . . . . . . . . InniBand . . . . . . . . . . . . . . . . . . . . . . . .

1.6 1.7

Single System Image (SSI) . . . . . . . . . . . . . . . . . . . . . . . . Cluster Middleware . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.7.1 1.7.2 1.7.3 Message-based Middleware . . . . . . . . . . . . . . . . . . . RPC-based Middleware . . . . . . . . . . . . . . . . . . . . . Object Request Broker . . . . . . . . . . . . . . . . . . . . . .

1.8

Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . .

2 Grid Computing : An Introduction 2.1 Grid Computing: denitions and overview . . . . . . . . . . . . . . . 2.1.1 2.1.2 2.2 2.3 Virtualization and Grid . . . . . . . . . . . . . . . . . . . . .

Grids over Cluster Computing . . . . . . . . . . . . . . . . . .

An example of Grid Computing environment . . . . . . . . . . . . . . Grid Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 2.3.2 2.3.3 2.3.4 2.3.5 Fabric Layer: Interfaces to Local Resources . . . . . . . . . . . Connectivity Layer: Managing Communications . . . . . . . . Resource Layer: Sharing of a Single Resource . . . . . . . . . Collective Layer : Co-ordination with multiple resources . . . Application Layer : User dened Grid Applications . . . . . .

2.4

Grid Computing with Globus . . . . . . . . . . . . . . . . . . . . . . v

2.5

Resource Management in Grid Computing . . . . . . . . . . . . . . . 2.5.1 2.5.2 Resource Specication Language . . . . . . . . . . . . . . . . . Globus Resource Allocation Manager (GRAM) . . . . . . . . .

39 40 41 42 43 45 45 47 48 55 57 61 62 62 63 64 67 68

2.6 2.7

Evolution towards Cloud Computing from Grid . . . . . . . . . . . . Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3 An overview of Cloud Architecture 3.1 3.2 Cloud Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cloud Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 3.2.2 3.2.3 3.3 A layered model of Cloud architecture - Cloud ontology . . . . Cloud Business Model . . . . . . . . . . . . . . . . . . . . . . Cloud Deployment Model . . . . . . . . . . . . . . . . . . . .

Cloud Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 3.3.2 3.3.3 Infrastructure as a Service (IaaS) . . . . . . . . . . . . . . . . Platform as a Service (PaaS) . . . . . . . . . . . . . . . . . . . Software as a Service (SaaS) . . . . . . . . . . . . . . . . . . .

3.4 3.5 3.6

Virtualization on Cloud . . . . . . . . . . . . . . . . . . . . . . . . . Example of a Cloud Implementation . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4 Grid and Cloud Computing Comparisons : Similarities & Dierences 4.1 4.2 Major Focus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Points of Considerations . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 4.2.2 4.2.3 Business Model . . . . . . . . . . . . . . . . . . . . . . . . . . Scalability issues . . . . . . . . . . . . . . . . . . . . . . . . . Multitasking and Availability . . . . . . . . . . . . . . . . . . 69 69 70 70 71 72

vi

4.2.4 4.2.5 4.2.6 4.3

Resource Management . . . . . . . . . . . . . . . . . . . . . . Application Model . . . . . . . . . . . . . . . . . . . . . . . . Other issues . . . . . . . . . . . . . . . . . . . . . . . . . . . .

72 76 76 77 77 77 79

Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 Comparative results . . . . . . . . . . . . . . . . . . . . . . .

4.4

Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5 Conclusion and Future works

vii

List of Tables3.1 Example of existing Cloud Systems w.r.to classication into layers of Cloud Ontology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 4.1 CPU utilization in Full Virtualization and Paravirtualization . . . . . Comparative analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 66 78

viii

List of Figures1.1 1.2 1.3 1.4 1.5 1.6 2.1 2.2 2.3 2.4 2.5 2.6 2.7 Eras of Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . Distributed computing . . . . . . . . . . . . . . . . . . . . . . . . . . Architecture of Cluster Computing . . . . . . . . . . . . . . . . . . . Categories of Cluster Interconnection Hardware. . . . . . . . . . . . . Traditional Protocol Overhead and Transmission Time. . . . . . . . . The InniBand Architecture . . . . . . . . . . . . . . . . . . . . . . . Evolution of Grid Computing . . . . . . . . . . . . . . . . . . . . . . Resource availability according to demand . . . . . . . . . . . . . . . Serving job requests in traditional environment . . . . . . . . . . . . Serving job requests in traditional environment . . . . . . . . . . . . Google search architecture . . . . . . . . . . . . . . . . . . . . . . . . Grid Protocol Architecture . . . . . . . . . . . . . . . . . . . . . . . . Collective and Resource layer protocols are combined in various ways to provide application functionality . . . . . . . . . . . . . . . . . . . 2.8 Programmers view of Grid Architecture. Dotted lines denotes protocol interactions where solid lines represent a direct call . . . . . . . . . . 2.9 A resource management architecture for Grid Computing environment 37 40 42 43 36 3 5 9 11 14 20 28 30 31 32 33 34

2.10 Globus GRAM Architecture . . . . . . . . . . . . . . . . . . . . . . . 2.11 Enhancement of generic Grid architecture to Service Oriented Grid . ix

3.1 3.2 3.3

Components of a Cloud Computing Solution . . . . . . . . . . . . . . Hierarchical abstraction layers of Cluster, Grid and Cloud Computing Cloud layered architecture : consists of ve layers, gure represents inter-dependency between layers . . . . . . . . . . . . . . . . . . . . .

46 48

49

3.4

Non-cloud environment needs three servers but in the Cloud, two servers are used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 55 59 59 61 62 63 63 64

3.5 3.6 3.7 3.8 3.9

Cloud computing Business model . . . . . . . . . . . . . . . . . . . . External or Public Cloud . . . . . . . . . . . . . . . . . . . . . . . . . Internal or Private Cloud . . . . . . . . . . . . . . . . . . . . . . . . . Example of Hybrid Cloud . . . . . . . . . . . . . . . . . . . . . . . . Correlation between Cloud Architecture and Cloud Services . . . . .

3.10 Infrastructure as a Service . . . . . . . . . . . . . . . . . . . . . . . . 3.11 Platform as a Service . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.12 Software as a Service . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.13 A fully virtualized deployment where operating platform running on servers is displayed . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.14 A Paravirtualized deployment where many OS can run simultaneously 4.1 4.2 Motivation of Grid and Cloud . . . . . . . . . . . . . . . . . . . . . . Comparison regarding performance, reliability and cost . . . . . . . .

65 66 70 71

x

Chapter 1 Computing with Distributes Units: Computer ClustersComputing Industry is one of the fastest growing industries and it started since 1943. Computers [1, 2] between 1943 and 1959 usually regarded as rst generation computers and are based on valves and wire circuits. They are [3] characterized by the use of punched cards and vacuum valves. All programming was done in machine code.

The Second Generation Computers were built between 1959 and 1964 .They were based on transistors and printed circuits. So they were much smaller. These computers were more powerful, accepting English-like commands, and so were much more exible in their applications.

Computers built between 1964 and 1972 are often regarded as Third Generation computers. They werere based on the rst integrated circuits and creating even smaller machines.

Computers built after 1972 are often called fourth generation computers. These computers were based on LSI (Large Scale Integration) of circuits such as microprocessors - typically 500 or more components on a chip. Later developments include 1

VLSI (Very Large Scale Integration) typically 10,000 components.

The fth generation computers are based on parallel processing and VLSI integration - but are still being developed. The recent advances in VLSI(Very Large Scale Integration) technology has played a major role in the development of powerful sequential and parallel computers. Software technology is also developing fast as well. Mature software, like Operating Systems, programming languages, development methodologies, and tools, are now available. This gives the opportunity of development and deployment of applications needs for scientic, engineering, and commercial needs. Again several challenging applications, such as weather forecasting and earthquake analysis, have become the main driving force behind the development of powerful parallel computers.

So we can show computing as two prominent eras: Sequential Computing Era Parallel Computing Era A graphical view of the changes in computing eras is shown in Figure 1.1. Each computing era started with hardware architectures of the system and then followed by system software specially operating systems and compilers, applications, and reaching its limit with its growth Problem Solving Environments. Each component of computing eras had to face three phases: R&D (Research and Development), commercialization, and commodity. Technology for the development of components of parallel era is not so much developed as for sequential era.

There are several reasons of using parallel computers. Some of them are 2

Figure 1.1: Eras of Computing

3

Parallelism is one of the best ways to overcome the speed bottleneck of a single processor. The price performance ratio of a small cluster-based parallel computer as opposed to a minicomputer is much smaller and consequently a better value. Developing and producing systems of moderate speed using parallel architectures is much cheaper than the equivalent performance of a sequential system. In the 1980s it was believed that computer performance was best improved by creating faster and more ecient processors. But this idea was challenged by parallel processing. This means linking together two or more computers to jointly solve some computational problem. Since the early 1990s there has been an increasing trend to move away from expensive and specialized proprietary parallel supercomputers towards networks of workstations. This was the driving force for starting Cluster Computing. Later several distributed computing systems are developed like Grid Conputing and Cloud Computing. In this chapter we will going to discuss about Cluster Computing.

1.1

Distributed Systems

A distributed system is a computing system in which several autonomous computers are linked by a computer network that appear to the users of the system as a single computer.

The computers in the network interact with each other in order to achieve a common goal. The program which runs in a distributed system are called distributed program. By running a distributed system software the computers are enabled to: Coordinate their activities. 4

Figure 1.2: Distributed computing Share resources: hardware, software, data. Achieve transparancy of resources. Illusion of single system while running upon multiple system. Distributed systems are useful for [4] breaking down an application into individual computing (Figure 1.2) agents so that they can be easily solve. These systems are distributed over a network. They work together on a cooperative task. They can solve larger problems without larger computers. So they are very cheap in comparison to single system computing. So now a days Distributed systems become more preferableThere is a central sever and several clients and they are connected together. Various parallel devices are connected to the whole system through distributed system and both operator and client can use them.

1.1.1

Centralized vs Distributed Systems

Here are some [5] dierence between Centralized and Distributed Systems. Centralized Systems: Centralized systems have non-autonomous components. 5

Centralized systems are often build using homogeneous technology. Multiple users share the resources of a centralized system at all times. Centralized systems have a single point of control and of failure. Distributed Systems: Distributed systems have autonomous components. Distributed systems may be built using heterogeneous technology. Distributed system components may be used exclusively. Distributed systems are executed in concurrent processes. Distributed systems have multiple points of failure.

1.1.2

Advantages of Distributed Systems

Distributed system has [6] several advantages over single system. Some of them are Performance: Very often a collection of processors can provide higher performance than a centralized computer.Distributed System has better Price/performance ratio. Distribution: There are applications involve, by their nature, spatially separated machines (banking, commercial, automotive system). Reliability: There may be crash of the machines. For single system if machine crashes then all data will be lost but for Distributed system if some of the machines crash, the system can survive. Incremental growth: As requirements on processing power grow, new machines can be added incrementally. 6

Sharing of data/resources: Shared data is essential to many applications (banking, computer supported cooperative work, reservation systems); other resources can be also shared (e.g. expensive printers). Communication: Give the opportunity of human-to-human communication.

1.1.3

Issues and Challanges in Distributed Systems

Though there are several advantages of [7] Distributed system, there are some disadvantages also. Some of them are Diculties of developing distributed software: It is dicult to develop software for these distributed systems. It is hard to nd that how should operating systems, programming languages and applications look like. Networking problems: Several problems are created by the network infrastructure, which have to be dealt with: loss of messages, overloading, etc. Security problems: Sharing generates the problem of data security. More components to fail: As Distributed systems deals with larger network, so there are more possibility of failure of the system and data transfer.

1.2

Computer Clusters

A cluster [8] is a type of parallel or distributed processing system. It consists of a collection of interconnected stand-alone computers and they working together as a single,integrated computing resource. All the component subsystems of a Cluster are supervised within a single administrative domain, usually residing in a single room 7

and managed as a single computer system. We can use Cluster computing [9] for load balancing as well as for high availability. We can also use Cluster computing as a relatively low-cost form of parallel processing for scientic and other applications that lend themselves to parallel operations. Some properties of cluster computing: The computers also known as nodes on a cluster are networked in a tightlycoupled fashion .They are all on the same subnet of the same domain and often networked with very high bandwidth connections. The nodes of a Cluster are homogeneous.They all use the same hardware, run the same software, and are generally congured identically. Each node in a cluster is a dedicated resource generally only the cluster applications run on a cluster node. We use the Message Passing Interface (MPI) [10] in Cluster which is a programming interface that allows the distributed application instances to communicate with each other and share information. In Cluster Computing we use Dedicated hardware, high-speed interconnects, and MPI that provide clusters the ability to work eciently on ne-grained parallel problems where the subtasks must communicate many times per second, including problems with short tasks, some of which may depend on the results of previous tasks.

1.3

Architecture of Computer Clusters

In Cluster Computing a computer node can be a single or multiprocessor system [11].The nodes can be PCs, workstations,or SMPs with memory, I/O facilities, and 8

Figure 1.3: Architecture of Cluster Computing an operating system. In Cluster Computing two or more nodes are connected together. These nodes can exist in a single cabinet or be physically separated and connected via a LAN. This LAN-based inter-connected cluster of computers appear as a single system to users and applications. Cluster Computing can provide a cost-efective way to gain features and benets like fast and reliable services that could previously found only on more expensive proprietary shared memory systems. The typical architecture of a cluster is shown in Figure 1.3. A Cluster Computing system is consist of several components. The following are some prominent components of cluster computers: Cluster is consist of Multiple High Performance Computers. There can be PCs, Workstations, or SMPs. There is a state-of-the-art Operating Systems. Operating System can be Layered or Micro-kernel based. Several high Performance Networks/Switches are used to connect the nodes of the Cluster. Among them Gigabit Ethernet and Myrinet are most common. Cluster Interconnection use Network Interface Cards .

9

Several Fast Communication Protocols and Services are used to communicate within nodes. Active messages, Fast Messages are such type of Protocols. Later we get some standards like InniBand for communicate . There is a Middleware which sits between operating system and application. Middleware provides the system Single System Image (SSI) and System Availability Infrastructure. Middleware consist of Several hardwares like Digital (DEC) Memory Channel and Operating System Kernel or Gluing Layer such as Solaris MC and GLUnix . Applications and Subsystems that consist of several applications,runtime systems and resource management and scheduling software. Applications such as system management tools . Runtime Systems such as software DSM and parallel le system. Resource Management and Scheduling software such as LSF (Load Sharing Facility). Cluster includes Parallel Programming Environments and Tools such as compilers, MPI (Message Passing Interface). Both Sequential and Parallel or Distributed Applications. Sequential Parallel or Distributed

1.4

Cluster Interconnection

In Cluster Computing The choice of interconnection technology is a key component. We can classify the Interconnection technologies into four categories. These four 10

Figure 1.4: Categories of Cluster Interconnection Hardware. categories depends on the internal connection and how the nodes communicate each other . The internal connection can be from the I/O bus or the memory bus and the communication between the computers can be performed primarily using messages or using shared storage [12]. Figure 1.4 illustrates the four types of interconnection. Among the four interconnection categories I/O attached message-based systems are by far the most common. This system includes all commonly-used wide-area and local-area network technologies. It also includes several recent products that are specically designed for cluster computing. I/O attached shared storage systems include computers that share a common disk sub-system. Memory attached systems are not common like I/O attached systems,since the memory bus of an individual computer generally has a design that is unique to that type of computer. However, many memory-attached systems are implemented.Most of the time they are implemented in software or with memory-mapped I/O, such as Reective Memory [13]. There are several Hybrid systems that combine the features of more than one category. Example of a Hybrid system the Inniband standard. Inniband [14] is an I/O attached interconnection. It can be used to send data to a shared disk sub-system as well as to send messages to another computer.There are many factors that aect the choice of interconnect technology for a cluster. Factors like compatibility with the cluster hardware and operating system, price, and performance.Performance of a Cluster depends on the latency and bandwidth.

11

Latency is the time needed to send data from one computer to another. Latency also includes overhead for the software to construct the message as well as the time to transfer the bits from one computer to another. Bandwidth is the number of bits per second that can be transmitted over the interconnect hardware. Applications that utilise small messages will have better performance particularly because the latency is reduced.Applications that send large messages will have better performance particularly as the bandwidth increases.The latency is a function of both the communication software and network hardware.

1.5

Protocols for Cluster Communication

A communication protocol denes a set [15] of rules and conventions for communicating between nodes among the cluster. Each protocol uses dierent technology to exchange information. Communication protocols can be classied as: Connection-oriented or connectionless. Oering various levels of reliability.Protocol can be reliable that fully guaranteed to arrive in order or can be unreliable that not guaranteed to arrive in order. Communication can be not buered which is synchronous, or buered which is asynchronous. By the number of intermediate data copies between buers, which may be zero, one, or more.

12

Several protocols are used in clusters. Formerly Traditional internet protocols are used for clustering. Later several protocols that have been designed specically for cluster communication .Finally two new protocol standards have been specially designed for use in cluster computing.

1.5.1

Internet Protocols

The Internet Protocol (IP) is the standard for networking worldwide. The Transmission Control Protocol (TCP) and the User Datagram Protocol (UDP) are both transport layer protocols built over the Internet Protocol. TCP and UDP protocols and the de facto standard BSD sockets Application Programmers Interface (API) to TCP and UDP were among the rst messaging libraries used for [16] cluster computing. Internet Protocol use one or more buers in system memory with the help of operating system services. User application constructs the message in user memory, and then makes an operating system request to copy the message into a system buer. A system interrupt is required for send and receive. In Internet protocol Operating system overhead and the overhead for copies to and from system memory are a signicant portion of the total time to send a message. As network hardware became faster during the 1990s, the overhead of the communication protocols became signicantly larger than the actual hardware transmission time for messages, as shown in Figure 1.5. So there needed the necessity of new types of protocols for cluster computing.

13

Figure 1.5: Traditional Protocol Overhead and Transmission Time.

1.5.2

Low-latency Protocols

For avoiding operating system intervention several research projects were done during the 1990s.These projects led to the development low-latency protocols. These protocols at the same time providing user-level messaging services across high-speed networks. Low-latency protocols developed during the 1990s include Active Messages, Fast Messages, the VMMC (Virtual Memory-Mapped Communication) system, U-net and Basic Interface for Parallelism (BIP), among others. 1.5.2.1 Active Messages

Active Messages was developed in the university of Berkeley. It [17] is the enabling low-latency communications library for the Berkeley Network of Workstations (NOW) project [18]. Short messages in Active Messages are synchronous and based on the concept of a request-reply protocol. Sending user-level application constructs a message in user memory. To transfer the data, the receiving process allocates a receive buer in user memory on the receiving side and sends a request to the sender. The sender replies by copying the message from the user buer on the sending

14

side directly to the network. No buering in system memory is performed. Network hardware transfers the message to the receiver, and then the message is transferred from the network to the receive buer in user memory. It is required in Active Messages that user virtual memory on both the sending and receiving side be pinned to an address in physical memory .The reason behind it so that it will not be paged out during the network operation.Once the pinned user memory buers are established, no operating system intervention is required for a message to be sent. Since no copies from user memory to system memory are used, this protocol is known as a zero-copy protocol. To support multiple concurrent parallel applications in a cluster Active Messages was extended to Generic Active Messages (GAM). In GAM, a copy sometimes occurs to a buer in system memory on the receiving side so that user buers can be reused more eciently. In this case, the protocol is referred to as a one-copy protocol. 1.5.2.2 Fast Messages

Fast Messages was developed at the University of Illinois.Itl similar to Active Messages [19]. Fast Messages extends Active Messages by imposing stronger guarantees on the underlying communication. Fast Messages guarantees that all messages arrive reliably and in-order, even if the underlying network hardware does not. Fast Message using ow control to ensure that a fast sender cannot overrun a slow receiver, thus causing messages to be lost. Flow control is implemented in Fast Messages with a credit system that manages pinned memory in the host computers.

15

1.5.2.3

VMMC

The Virtual Memory-Mapped Communication(VMMC) [20] system was low-latency protocol for the Princeton SHRIMP project. One goal of VMMC is to view messaging as reads and writes into the user-level virtual memory system. VMMC works by mapping a page of user virtual memory to physical memory.It makes a correspondence between pages on the sending and receiving sides. It uses specially designed hardware .This hardware allows the network interface to snoop writes to memory on the local host and have these writes automatically updated on the remote hosts memory. Various optimisations of these writes have been developed that help to minimize the total number of writes, network trac, and overall application performance. VMMC is an example of a paradigm known as distributed shared memory (DSM). In DSM systems memory is physically distributed among the nodes in a system, but processes in an application may view shared memory locations as identical and perform reads and writes to the shared memory locations. 1.5.2.4 U-net

The U-net network interface architecture [21] was developed at Cornell University. U-net provides zero-copy messaging where possible. U-net adds the concept of a virtual network interface for each connection in a user application. Just as an application has a virtual memory address space that is mapped to real physical memory on demand. Each communication endpoint of the application is viewed as a virtual network interface mapped to a real set of network buers and queues on demand. 16

The advantage of this architecture is that once the mapping is dened, each active interface has direct access to the network without operating system intervention. The result is that communication can occur with very low latency. 1.5.2.5 BIP

BIP [22] (Basic Interface for Parallelism) is a low-latency protocol that was developed at the University of Lyon. BIP is designed as a low-level message layer over which a higher-level layer such as Message Passing Interface (MPI) [10]can be built. Programmers can use MPI over BIP for parallel application programming. The initial BIP interface consisted of both blocking and non-blocking calls. Later versions (BIP-SMP) provide multiplexing between the network and shared memory under a single API for use on clusters of symmetric multiprocessors.

BIP achieves low latency and high bandwidth by using dierent protocols for various message sizes.It also provides a zero or single memory copy of user data. To simply the design and keep the overheads low, BIP guarantees in-order delivery of messages, although some ow control issues for small messages are passed to higher software levels.

1.5.3

Standards for Cluster Communication

Research on low-latency protocols had progressed suciently during those years.So new standard for low-latency messaging to be developed, the Virtual Interface Architecture (VIA). During a similar period of time industrial researchers worked on standards for shared storage subsystems. The combination of the eorts of many

17

researchers has resulted in the InniBand standard.

1.5.3.1

VIA

The Virtual Interface Architecture (VIA) [23] is a communications standard that combines many of the best features of various academic projects. A consortium of academic and industrial partners, including Intel, Compaq, and Microsoft, developed the standard. VIA supported heterogeneous hardware and was available as of early 2001. It was based on the concept of a virtual network interface.Before a message can be sent in VIA, send and receive buers must be allocated and pinned to physical memory locations. there was no need of system calls after the buers and associated data structures are allocated. A send or receive operation in a user application consists of posting a descriptor to a queue. The application can choose to wait for a conrmation that the operation has completed, or can continue host processing while the message is being processed.

Several hardware vendors and some independent developers have developed VIA implementations for various network [24, 25] products. VIA implementations can be classied as native or emulated. A native implementation of VIA o-loads a portion of the processing required to send and receive messages to special hardware on the network interface card. 18

When a message arrives in a native VIA implementation, the network card performs at least a portion of the work required to copy the message into user memory. An emulated VIA implementation, the host CPU performs the processing to send and receive messages. Although the host processor is used in both cases, an emulated implementation of VIA has less overhead than TCP/IP. However, the services provided by VIA are dierent than those provided by TCP/IP, since the communication may not be guaranteed to arrive reliably in VIA.

1.5.3.2

InniBand

The InniBand standard [26] was an another standard for cluster protocol was supported by a large consortium of industrial partners, including Compaq, Dell, HewlettPackard, IBM,Intel, Microsoft and Sun Microsystems. The InniBand architecture replaces the standard shared bus for I/O on current computers with a high-speed serial, channel-based, message-passing, scalable, switched fabric. There are two types of adaptors. Host channel adapters(HCA) and target channel adapters(TCA). All systems and devices attach to the fabric through host channel adapters (HCA) or target channel adapters (TCA), as shown in Figure 1.6. In InniBand data is sent in packets, and six types of transfer methods are available, including: Reliable and unreliable connections, Reliable and unreliable datagrams, Multicast connections and Raw packets.

19

Figure 1.6: The InniBand Architecture InniBand supports remote direct memory access (RDMA) read or write operations.This allows one processor to read or write the contents of memory at another processor, and also directly supports IPv6 [27] messaging for the Internet. There are several components of InniBand: Host channel adapter(HCA): Host channel adapter is an interface that resides within a server.Communicates directly with the servers memory, processor, target channel adapter or a switch. It guarantees delivery of data and can recover from transmission errors. Target channel adapter(TCA): Target channel adapter enables I/O devices to be located within the network independent of a host computer. It includes an I/O controller that is specic to its particular devices protocol. TCAs can communicate with an HCA or a switch. Switch: Switch is virtually equivalent to a trac cop.It allows many HCAs and TCAs to connect to it and handles network trac. Oers higher availability, higher aggregate bandwidth, load balancing, data mirroring and much more. Looks at the local route header on each packet of data and forwards it to the 20

appropriate location. A group of switches is referred to as a fabric. If a host computer is down, the switch still continues to operate. The switch also frees up servers and other devices by handling network trac. Router: Router forwards data packets from a local network (called a subnet) to other external subnets. Reads the global route header and forwards to appropriate address.Rebuilds each packet with the proper local address header as it passes it to the new subnet. Subnet Manager: It is an application responsible for conguring the local subnet and ensuring its continued operation.Conguration responsibilities include managing switch and router setups and reconguring the subnet if a link goes down or a new one is added. The IBA is comprised of four primary layers that describe communication devices and methodology. Physical Layer: Denes the electrical and mechanical characteristics of the IBA, including the cables, connectors and hot-swap characteristics. IBA connectors include ber, copper and backplane connectors. There are three link speeds specied as 1X, 4X and 12X.1X link cable has four wires, two for each direction of communication (read and write). Link Layer: Includes packet layout, point-to-point link instructions, switching within a local subnet and data integrity.Two types of packets, management and data. Management packets handle link congurations and maintenance. Data packets carry up to 4 kilobytes of transaction payload. Every device in a local subnet has a local ID (LID) for forwarding data appropriately. Handles data integrity by 21

including variant and invariant cyclic redundancy checking (CRC). The variant CRC checks elds that change from point-to-point and the invariant CRC provides end-to-end data integrity. Network Layer: The network layer is responsible for routing packets from one subnet to another. The global route header located within a packet includes an IPv6 address for the source and destination of each packet. For single subnet environments, the network layer information is not used. Transport Layer: Handles the order of packet delivery.Also handles partitioning, multiplexing and transport services that determine reliable connections.

1.6

Single System Image (SSI)

Single System Image (SSI) is a property through which we can view a distributed system as a single unied computing resource. This property hides the distributed and heterogeneous nature of the available resources and represents them before the users as a single, powerful, unied computing resource [28]. A system using SSI gives the users a system view of the resources available to them but they dont have to know the node to which they are physically associated.These resources can range from access and manipulation of remote processes to the use of a global le-system.SSI provides high availability,the system can operate after some failure. It also ensures that the nodes are evenly loaded. SSI cluster-based systems are mainly focused on complete transparency of resource management, scalable performance and system availability in supporting user applications [28, 29, 30, 31, 32].There are several key attributes of SSI. The following are among some of the desirable key SSI attributes : point of entry, user interface, pro22

cessspace, I/O and memory space, job-management system and point of management and control. The most important benets of SSI [28] include: SSI allows the use of resources in a transparent way.The user dont have to think about their physical location. It oers the same command syntax as in other systems and thus reduces the risk of operator errors, with the result that end-users see an improved performance, reliability and higher availability of the system. The end-userdont have to know where in the cluster an application will run. SSI greatly simplies system management and thus reduced cost of ownership. It promotes the development of standard tools and utilities.

1.7

Cluster Middleware

Middleware is the layer of software sandwiched between the operating system and applications.It has re-emerged as a means of integrating software applications that run in a heterogeneous environment.There is large overlap between the infrastructure that is provided to a cluster by high-level Single System Image (SSI) services and those provided by the traditional view of middleware. Middleware helps a developer overcome three potential problems with developing applications on a heterogeneous cluster: Gives the ability to access to software inside or outside their site. Helps to integrate softwares from dierent sources. Rapid application development. 23

The services that middleware provides are not restricted to application development.Middleware also provides services for the management and administration of a heterogeneous system.

1.7.1

Message-based Middleware

Message-based middleware uses a common communications protocol to exchange data between applications. The communications protocol hides many of the low-level message passing primitives from the application developer. Message-based middleware software can pass messages directly between applications, send messages via software that queues waiting messages, or use some combination of the two. Examples of this type of middleware are the three upper layers of the OSI model [33], the session, presentation and applications layers.

1.7.2

RPC-based Middleware

There are many applications where the interactions between processes in a distributed system are remote operations, often with a return value. For these applications (RPC) Remote Procedure Call is used.The implementation of the client/server model in terms of Remote Procedure Call (RPC) allows the code of the application to remain the same whether the procedures are the same or not. Inter-process communication mechanisms serve four important functions [34]: They oer mechanisms against failure. Thay also provides the means to cross administrative boundaries. They allow communications between separate processes over a computer network. They enforce clean and simple interfaces, thus providing a natural aid for the modular structure of large distributed applications. 24

They hide the distinction between local and remote communication, thus allowing static or dynamic reconguration.

1.7.3

Object Request Broker

An Object Request Broker (ORB) is a type of middleware that supports the remote execution of objects. An international ORB standard is CORBA (Common Object Request Broker Architecture). It is supported by more than 700 groups and managed by the Object Management Group (OMG) [35].The OMG is a non prot-making organization whose objective is to dene and promote standards for object orientation in order to integrate applications based on existing technologies.

The Object Management Architecture (OMA) is characterized by the following: The Object Request Broker (ORB): It is the controlling element of the architecture and it supports the portability of objects and their interoperability in a network of heterogeneous systems. Object services: These are specic system services for the manipulation of objects.Their goal is to simplify the process of constructing applications. Application services. These oer a set of facilities for allowing applications access databases, to printing services, to synchronize with other application, and so on. Application objects: These allow the rapid development of applications.A new application can be formed from objects in a combined library of application services.

25

1.8

Concluding Remarks

As a begining of thesis, we are studying the necessity and issues related of parallal computation and focusing architectures, protocols and standerds of Computer Clusters. The motivation of distributed processing using Computer Cluster turns into more advance technology named Grid Computing which we will going to discuss in next Section.

26

Chapter 2 Grid Computing : An IntroductionGrid Computing, more specially Grid Computing System is a virtualized distributed environment. Grid environment provides dynamic runtime selection, sharing and aggregation of geographically distributed resources based on resources availability, capability, performance and cost of these computing resources. Fundamentally, Grid Computing is the advanced form of distributed processing which is the combination of decentralized architecture for managing computing resources and a layered hierarchical architecture for providing services to the user [36].

The rest of the chapter is organized as follows. We begin our discussion with denition of Grid Computing and the benets of virtualization on Grid in Section 2.1. In Section 2.3 and 2.4 we consider the underlying layers of Grid Computing in details. Resource management architecture is discussed in Section 2.5 and the a protocol for resource management (GRAM) is discussed in Section 2.5.2 . We Conclude our discussion in Section 2.6 introducing a new approach of distributed processing named Cloud Computing.

27

2.1

Grid Computing: denitions and overview

The concept of Grid was introduced in early 1990s, where high performance computers were connected by fast data communication. The motivation of that approach was to support calculation- and data-intensive scientic applications. Figure 2.1 [37] shows the evolution of grid over time.

Figure 2.1: Evolution of Grid Computing The basics of Grid is to co-allocation of distributed computation resources. The most cited denition of Grid is [38]:

A computational grid is a hardware and software infrastructure that provides dependable, consistent, pervasive, and inexpensive access to high-end computational capabilities.

Again, according to IBM denition [39],

A grid is a collection of distributed computing resources available

28

over a local or wide area network that appear to an end user or application as one large virtual computing system. The vision is to create virtual dynamic organizations through secure, coordinated resource-sharing among individuals, institutions, and resources.

A Grid Computing environments must include: Coordinated resources: Grid environment must be facilitated with necessary infrastructure for co-ordination of resources based upon policies and service level agreements. Open standard protocols and frameworks: Open standards can provide interoperability and integration facilities. These standard should be applied for resource discovery, resource access and resource co-ordination. Open Grid Services Infrastructure (OGSI) [40] and Open Grid Services Architecture (OGSA) [41] was published by the Global Grid Forum (GGF) as a proposed recommendation for this approach. Grid Computing can be distinguished also from High Performance Computing (HPC) and Clustered Systems in following way: while Grid focuses on resource sharing and can result in HPC, whereas HPC does not necessarily involve sharing of resources [42].

2.1.1

Virtualization and Grid

Virtualization is the process of making resources accessible to a user as if they were a single, larger, homogeneous, resource. Virtualization supports the concept of dynamically shifting resources across various platforms so that computing demands can be scaled with available resources [43]. Figure 2.2 shows the necessity of virtualization 29

to support the proper utilization of resources. Although average utilization of the resources may be relatively low, during peak cycles the server might be overtaxed and resources may not be available.

Figure 2.2: Resource availability according to demand Grid environments can supports the benets of virtualization. Grid enables the abstraction of distributed systems and resources such as processing, network bandwidth and data storage to create a single system image. Such abstraction provides continuous access to large pool of IT capabilities. Figure 2.3 and 2.4 [37] compares the Grid environment over the traditional computations. In Figure 2.4 and organization-owned computational grid is shown where a scheduler sets policies and priorities for placing jobs in the grid infrastructure.

2.1.2

Grids over Cluster Computing

Computer Clusters detailed in Chapter XX are local to the domain. The Clusters are designed to resolve the problem of inadequate computing power. It provides more computation power by pooling of computational resources and parallelizing the workload. As Clusters provide dedicated functionality to local domain, they are not suitable solution for resource sharing between users of various domains. Nodes in the Cluster controlled centrally and Cluster manager is monitoring the state of the node 30

Figure 2.3: Serving job requests in traditional environment [44]. So, in brief, Cluster units only provide a subset of Grid functionality.

2.2

An example of Grid Computing environment

We consider searching world wide web in Google as an example of Grid Computing. Figure 2.5 shows the abstract view of Google search architecture [45]. Google process tens of thousands of queries per second. Each of this query is rst received by one of the Web Servers, then passes it to the array of Index Servers. Index Servers are responsible for keeping index of words and phrases found in websites. The servers are distributed in several machines and hence the searching should run concurrently. In fraction of second, index servers perform a logical AND operation and return the reference of the websites containing query (searching phrase). The resultant references then sent to Store Servers. Store Servers maintain compressed copies of all the pages known to Google. These compressed copies are used to prepare page snippets and nally presented to the end user in a readable form.

31

Figure 2.4: Serving job requests in traditional environment Crawler Machines synchronizing through the web and updating the Google database of pages stored in Index and Store servers. So, the Store Servers actually contains relatively recent and compressed copies of all the pages available in the web.

Grid Computing can facilitates the above scenario of ecient searching. As it stated earlier the servers are distributed and searching should be parallel in order to achieve eciency. The infrastructure also need to scale with the growth of web as the number of pages and indexes increased. Dierent organizations and numerous servers are shared with Google. Copy the content and transforming it into its local resource is allowed by Google. Local resources contain keyword database of the Index Servers and cached content in the database of the Store Servers. The resources partially shared with end-users who send queries through their browsers. Users can then directly contact with the original servers to request the full content of the web page.

Google is also shared its computing cycles. Google shares its computing resources, such as storage and computing capabilities with the end-user by performing data 32

Figure 2.5: Google search architecture caching, ranking and searching of query.

2.3

Grid Architecture

In this section we will discuss about grid architecture, which identies the basic components of a grid system/ It also denes the purpose and functions of such components. However, this layered Grid architecture also indicates how these components actually interacts with one another. Here we present Grid architecture in accordance with Internet protocol architecture. As Internet protocol architecture extends from network to application, however, we can relate the Grid layers into Internet layers [46]. Figure 2.6 shows the Grid layers from top to bottom.

Grid architecture described in [46], the Resource and Connectivity protocol is respon-

33

Figure 2.6: Grid Protocol Architecture sible for sharing individual resources. The protocols in this layer are designed to implement on a top of various types of resources which we identied as Fabric layer. The raw Fabrics, however, can be used to support application specic requirements.

2.3.1

Fabric Layer: Interfaces to Local Resources

Fabric layer provides the resources that can be shared in Grid environment. An example of such resources may be computational resources, storage systems, sensors and network systems. Grid architecture do not deals with logical resources for example distributed le systems, where resource implementation requires individual internal protocols [46].

Components of the Fabric layer implement the local and resource-specic operations on specic resources. Such resources are physical or even logical. These resourcespecic operations provides functionalities of sharing operations at higher levels. In order to support sharing mechanisms we need to provide [44] : an inquiry mechanism so that the components of Fabric are allowed to discover and monitor resources. 34

an appropriate (either application dependent or unied or both) resource management functionalities to control the QoS in Grid environment.

2.3.2

Connectivity Layer: Managing Communications

Connectivity layer denes the core communication and authentication protocols necessary for grid networks. Communication protocol transfers data between Fabric layer resources. Authentication protocols, however, build on communication services for providing cryptographically secure mechanisms to the Grid users and resources.

The communication protocol can work with any of the networking layer protocols that support transport, routing, and naming functionalities. In computational Grid, TCP/IP Internet protocol stack is commonly used [46].

2.3.3

Resource Layer: Sharing of a Single Resource

Resource layer is on top of Connectivity layer to dene the protocols along with API and SDKs for secure negotiation, monitoring, initialization, control and payment of sharing operations on individual resources. Resource layer uses Fabric layer interfaces and functions to access and control local resources. This layer entirely consider local and individual resources and therefore, ignores global resource management issues [46]. To share single resource, we need to classify two resource layer protocols [46]: Information protocols: Information protocols are used to discover the information about state and structure of the resource for example - the conguration of resource, current load state, usage policy or costing of the resource. Management protocols: Management protocols in Resource layer are used to control and access to a shared resource. The protocols specify resource re35

quirements, which includes advanced reservation and QoS and the operations on resources. Such operations include process creation, data access etc. There need to present some protocols to support monitoring application status and termination of an operation.

2.3.4

Collective Layer : Co-ordination with multiple resources

Resource layer, described in Section 2.3.3 deals with operation and management of single resource. But for global resource co-ordination Collective layer protocols have been used. This layer provides necessary API and SDKs not associated with specic resource rather the global resources in overall grid environment.

Figure 2.7: Collective and Resource layer protocols are combined in various ways to provide application functionality The implementation of Collective layer functions can be built on Resource layer or other Collective layer protocols and APIs [46]. Figure 2.7 shows a Collective coallocation API and SDK that uses a Resource layer management protocol to control resources. In the top of this, we dene a co-reservation service protocol and the service itself. So by calling the co-allocation API to implement co-allocation operations provide additional functionality such as authorization, fault tolerance etc. An appli-

36

cation then use the co-reservation service protocols to request and perform end-to-end reservations.

2.3.5

Application Layer : User dened Grid Applications

The top layer of the Grid consists with user applications, which are constructed by utilizing the services dened at each lower layer. At each layer, we have well-dened protocols that access some useful services for example resource management, data access, resource discovery etc. Figure 2.8 shows the correlation between dierent layers [46]. APIs are implemented by SDKs, which use Grid protocols to provide functionalities to end user. Higher level SDKs can also provide functionality so that it is not directly mapped to a specic protocol. However, it may combine protocol operations with calls to additional APIs to implement local functionality.

Figure 2.8: Programmers view of Grid Architecture. Dotted lines denotes protocol interactions where solid lines represent a direct call

37

2.4

Grid Computing with Globus

Globus [47] provides a software infrastructure so that applications can distributed computing resources as a single virtual machine [48]. Globus Tooklit, the core component of the infrastructure denes basic servise and capabilities required for computational Grid. Globus is designed as a layered architecture where high-level global services are built on the top of low-level local services. In this section we will discuss how Globus toolkit protocols actually interacts with Grid layers. Fabric Layer: Globus toolkit is designed to use existing fabric components [46]. For example, enquiry software is provided for discovering and state information of various common resources such as computer information (i.e. OS version, hardware conguration etc), storage systems (i.e. available spaces) etc. In the higher level protocols (particularly at the Resource layer) implementation of Resource management, is normally assumed to be the domain of local resource managers. Connectivity Layer: Globus uses public-key based Grid Security Infrastructure (GSI) protocols [49, 50] for authentication, communication protection, and authorization. GSI extends the Transport Layer Security (TLS) protocols [51] to address the issues of single sign-on, delegation, integration with various local security solutions. Resource Layer: A Grid Resource Information Protocol (GRIP) [52] is used to dene standard resource information protocol. The HTTP-based Grid Resource Access and Management (GRAM) [53] protocol is used for allocation of computational resources and also for monitoring and controlling the computation of those resources. An 38

extended version of the FTP, GridFTP [54], is used for partial le access and management of parallelism in the high-speed data transfers [46].

The Globus Toolkit denes client-side C and Java APIs and SDKs for these protocols. However, Server-side SDKs can also provided for each protocol, to provide the integration of various resources for example computational, storage, network into the Grid [46]. Collective Layer: Grid Information Index Servers (GIISs) supports arbitrary views on resource subsets, LDAP information protocol used to access resource-specic GRISs to obtain resource state and Grid Resource Registration Protocol (GRRP) is used for resource registration. Also couple of replica catalog and replica management services are used to support the management of dataset replicas. There is an on-line credential repository service known MyProxy provide secure storage for proxy credentials [55]. The The Dynamically-Updated Request Online Coallocator (DUROC) provides an SDK and API for resource co-allocation [56].

2.5

Resource Management in Grid Computing

In this section we will discuss about a resource management architecture for Grid environment described in [53]. The block diagram of the architecture is found in Figure 2.9. To communicate request for resources between components we use an Resource Specication Language (RSL) which is described details later in Section 2.5.1. With the help of the process called specialization, Resource Brokers transfer the high level RSL specication into concrete specication of resources. This specication of request

39

Figure 2.9: A resource management architecture for Grid Computing environment named ground request is passed to a co-allocator, which is responsible for allocating and management of resources at multiple sites. A multi-request is a request which is involved resources at multiple sites. Resource co-allocators can brake such multirequest into components and pass each element into appropriate resource manager. The information service, working between Resource Broker and Co-allocator is responsible for giving access to availability and capability of resources.

2.5.1

Resource Specication Language

Resource Specication Language (RSL) is combination of parameters including the operators: & : conjunction of parameter specications | : disjunction of parameter specications + : combining two or more request into single compound request or multi-request Resource broker, co-allocators and resource managers each dene a set of parameter-name. Resource managers generally recognize two types of parameter-name in order to com40

municate with local schedulers. MDS attribute name: to express constraint on resources. For example: memory>64 or network=atm etc. Scheduler parameters: used to communicate information related to job, i.e. count (number of nodes required), max_time (maximum time required), executables , environment (environment variables) etc. For example the following simple specication taken from [53], &(executable=myprog) (|(&count=5)(memory>=64)) (&(count=10)(memory>=32))) requests 5 nodes with at least 64MB memory or 10 nodes with atleast 32 MB memory. Here, executable and count are scheduler parameters.

Again, the following is an example of multi-request: +(&count=80)(memory>=64) (executable=my_executable) (resourcemanager=rm1) (&(count=256)(network=atm) (executable=my_executable) (resourcemanager=rm2) Here two requests are concatenated by + operator. This is also an example of ground request as every component of the request requires a resource manager.

2.5.2

Globus Resource Allocation Manager (GRAM)

Globus Resource Allocation Manager (GRAM) is designed to run jobs remotely and providing an API for submitting, monitoring, and terminating job. GRAM is the lowest level of Globus resource management architecture [57]. 41

Figure 2.10: Globus GRAM Architecture Figure 2.10 shows the basic architecture for GRAM. When a job is submitted, the request is sent to the gatekeeper of the remote computer. The gatekeeper handles the request and creates a job manager for the job. The job manager then starts and monitors the remote program, communicating state changes back to the user on the local machine. When the remote application terminates (either normally or by failing). the job manager also terminates [57].

2.6

Evolution towards Cloud Computing from Grid

The convergence of Grid Computing with Service-Oriented Computing (SOC) resembles the Grid functionality in form of services. Service-oriented Grid oers virtualization of the available resources which increase the versatility of Grid [58]. It also binds Grid specic services on the hardware level and application services. With the help of Grid Computing it is possible to integrate heterogeneous physical resources in to virtualized and centrally accessible computing unit. Based on the convergence with 42

SOC, Grid Computing is oered in form of Grid services [42] as shown in Figure 2.11.

Figure 2.11: Enhancement of generic Grid architecture to Service Oriented Grid In order to meet market demands, providers approach to oer the following functionality [42] : Scalable, exible, robust and reliably physical infrastructure Platform services to enable programming access to physical infrastructure with abstraction in interfaces SaaS (described in Chapter XX) supported scalable physical infrastructure All this is emerging in new on-line platforms referred as Cloud Computing, that provides X-as-a-Service products which we will going to discuss in next Chapter.

2.7

Concluding remarks

In this Chapter, we have discussed about the brief of Grid Computing environment and comparing it with traditional Clusters. Besides we have also discussed about layered architecture of Grid. As an implementation method of Grid, we consider 43

Globus toolkit and correlates Grid layers with Globus implementations. Later, we also discuss about the resource management issues in Grid and focusing how GRAM protocol actually used in Globus toolkit to manage resource requests. We conclude the chapter introducing Cloud Computing a new trend in distributed systems inspired form Grid and Service-oriented Computing.

44

Chapter 3 An overview of Cloud ArchitectureIn a Cloud environment hardware and software services needed to be stored on web servers the Cloud, rather spreading over a single computer connected through Internet. Cloud computing responsible for delivering IT functionalities to external users by obtaining that functionality from external providers with the services in pay-peruse manner over the Internet. These Cloud services are consumed via web browser or a dened API [59].

The rest of the chapter is organize as follows: we began our discuss with detail architectural overview of Cloud Computing environment in Section 3.2 . A details of Cloud services (PaaS, IaaS, SaaS) is discussed in Section 3.3 and Virtualization on Cloud is discussed in Section 3.4. We conclude the Chapter by explaining a practical Cloud implementation in Section 3.5.

3.1

Cloud Components

Cloud environments consist with the following elements : Clients, Data-centers and Distributed Servers [60]. These components are combined together to build a Cloud Computing Solution as shown in Figure 3.1. Each element has distinct functionalities which will describe next. 45

Figure 3.1: Components of a Cloud Computing Solution i. Clients:

Clients are same as in traditional Local Area Networks (LAN). In general, clients are computers or machines used for accessing functionalities. These machines may include laptops, tablet computers, mobile or cellular phones, PDAs because of their mobility. Clients are generally classied into following three categories: Mobile Clients: includes mobile devices like PDA or Smartphones. Examples are Blackberry, Windows Mobile Smartphone, iPhone/iPad etc. Thin Clients: the computers that do not have their internal hard drive, instead, server do all the work, the the clients task to display the information. Generally used as a terminal. Thick Clients: are regular computer, using a web browser to connect in the Cloud.

ii. Data-centers:

46

Data-center is collection of servers where the processes or application is hosted. Severs can be physically grouped in a room or building or can be distributed throughout the world. In virtualized severs, application is installed and allow multiple instances so the all virtual servers can access it. Using this principle, several virtual severs can run into one physical servers. Number of virtual server in a physical server depends upon type of the application, size and speed of the server and service provided by provider.

iii. Distributed Servers:

As said earlier, often servers are in geographically disparate locations. But to the end-users servers act as if they are operate right next to each other. This gives exibility in operations and enhance security and privacy. If any of the servers downs due to failure or maintenance purpose, service provided by system can still accessed thorough other distributed server(s).

3.2

Cloud Architectures

Cloud architectures focus the diculties arise in large-scale data processing. In traditional approach it is dicult to allocate processing units as per application demand. Also sometimes it is dicult to access CPU according to users requirement. Job allocation is another problem: often it is dicult to distribute and maintain largescale jobs on dierent machines. We need to provide recovery mechanism by another machine in for avoiding failures. Also, scalability is another issue in traditional approach, it is dicult to scale-up and scale-down automatically. Cloud architectures, in contrast to traditional approaches, concentrates to solve these problems [61].

47

In Cloud computing, computational resources are provided as services, generally XassS - known as X-as-a-Service. In particular, Cloud is vritualization of grid and traditional web services. When Cloud services and platform has been created, it is possible to access virtual Grid to the companies which request it by creating Guest Virtual Organizations (GVO) [62]. One possible distinction of Cluster, Grid and Cloud architecture is shown in Figure 3.2.

Figure 3.2: Hierarchical abstraction layers of Cluster, Grid and Cloud Computing Rest of the section we will discuss about various approaches on Cloud architectures and give a brief about underlying layers.

3.2.1

A layered model of Cloud architecture - Cloud ontology

The Cloud ontology is considered as stack of layers. Each layer consist with one or more Cloud services. Services with same level of abstraction (determined by their targeted users) belongs to same layers [63]. For example, Cloud software environment mainly targeted to programmers or developers. On the other hand, Cloud applications target end-users. So, Cloud software environment and Cloud applications classied as dierent layer.

The ordering in Cloud stack is important, it determines the work-ow in Cloud. 48

For example, Cloud applications are composed from Cloud Software environments. Hence, application layer is in upper position in Cloud stack. The Cloud ontology is shown in 3.3 which is depicted as stack of ve layers [63]: a) Cloud Application Layer b) Cloud Software Environment Layer c) Cloud Software Infrastructure Layer d) Software Kernel Layer e) Hardware Layer

Figure 3.3: Cloud layered architecture : consists of ve layers, gure represents interdependency between layers

(a) Cloud Application Layer:

Application layer is top of cloud layer and most visible to the end-users. Users can access these services through Internet by paying necessary fees. It carried out computational works from users terminal (input) to processing units (e.g. data centers) in where the applications are hosted. Total procedures are abstracted to the end-users and provides the outputs of CPU-intensive and memory intensive large scale tasks in their local machine.

From providers persecutive, higher manageability can be achieved. The application is deployed in providers infrastructure, not in client machine, hence, they can maintain or upgrade system without interrupting users. 49

The model generally known as Software as a Service (SaaS). Cloud applications can be grouped as a service for another Cloud services. Cloud applications can be developed in the Cloud software environments or sometimes in Cloud infrastructure components.

(b) Cloud Software Environment Layer:

The layer just below the Application layer is Software Environment layer. The layer mainly targets the developer, who build and deploy softwares for end-users in the Cloud. Providers of this layer provide suitable programming-language level developing environment by means of well dened and documented API. The API integrates developers softwares as well as provides necessary deployment and scalability support. The services provided by this layer is known as Platform as a Service (PaaS).

Developers are beneted by developing their application in Cloud Programming environment with a support of automatic load balancing, authentication services, e-mail services etc. Developers can add necessary services to their application ondemand, which makes application development is less tidier and minimize logic faults [63]. Hadoop [64], a Cloud Software Environment, provides developers a programming environment (MapReduce - programming model for data processing in large clusters [65]). Yahoos Pig [66] is a high level language which can process very large les in hadoop environment. That is how developers can beneted by several services as per necessity.

50

(c) Cloud Software Infrastructure Layer:

Software Infrastructure Layer provides necessary resources to the higher level layers. Services oered in this layer classied into following subclasses : i. Computational Resources, ii. Data Storage and iii. Communications. i. Computational Resources:

Cloud users get the computational resources by Virtual Machines (VMs) in this layer. Services provided is often known as Infrastructure as a Service (IaaS). Virtualization providers the user exibility in conguring settings. At the same time, it protects the physical infrastructure of providers data center [63]. Virtualization shown in Figure 3.4 where traditional non-cloud environment runs three dierent applications on its own server. On the other hand, Cloud shares the servers for OS and applications which results fewer servers [67].

Figure 3.4: Non-cloud environment needs three servers but in the Cloud, two servers are used

51

IaaS get beneted by two type of virtualization technologies : Paravirtualiztion and Hardware-assisted virtualiztion. Still, the problem of performance interference between VMs and sharing same cache and TLB hierarchy remains unsolved. Modern multi-core machines in main servers sometimes create performance isolation problem. This lack of performance isolation between VMs, that share same physical node is problematic for optimal performance [63]. We will cover more on virtualization in Section 3.4.

ii. Data Storage:

Data storage is another infrastructure resource in this layer which allows user to store their data in remote storage devices and provide an access mechanism anytime and from anywhere, The service provided by Cloud providers is known as Database as a Service (DaaS). DaaS facilitates scalability to cloud applications for both users and developers.

In preliminary level, Cloud storage system needs one data server for connecting to Internet. Client can access data by interacting with database server using a web-based interface. Server may send the le back kept by user or provide functionality to manipulate the data on-line. However, practically commercial Cloud storage systems use hundreds if data servers, For server maintenance or repairing purpose it is necessary to keep multiple machines to fulll users demand. Which creates redundancy but without this redundancy clients might not access information at any given time. Often, providers keep data on servers running in dierent power supplies. Which ensures, clients can still access and manipulate data even in case of power failures [68]. 52

Some example of Data storage systems are: distributed le systems (e.g. Google File System [69]), replicated relational databases (RDBMS) (e.g. Bayou [70]) and key-value stores (e.g. Dynamo [71]). RDBMS model gives more focus on consistency model [72, 73] but paid the cost of availability of data. On the other hand, key-value stores give much importance on the availability on data loosen up consistency model [63].

iii. Communication:

The rate of data transfer is high in Cloud environment. For providing Quality of Service (QoS) communication plays vital role in Cloud infrastructure. To meet QoS, concept of Communication as a Service (CaaS) introduces which consist of network security, dynamic trac isolation or dedicated bandwidth, guaranteed message delay, communication encryption, network monitoring etc. [63]. Though CaaS is least discussed topic in literature, there are couple of research publications and articles [74, 75, 76] focus design and architecture of CaaS for providing QoS in communication systems. A practical example of CaaS is Microsofts new Connected Service Framework (CFS) [77]. Also, VoIP telephone systems and instant messaging softwares in Cloud can also use CaaS for better network utilization. (d) Software Kernel:

Software kernel layers provides software management functionalities for physical servers in Cloud. Such software kernel can be implemented as an OS kernel,

53

hypervisor, Virtual Machine Monitor (VMM) and/or as a clustering middleware [63]. Grid applications can run in this layer connected through several clusters of machines. But due to lack of virtualization in Grid, periodic check-pointing and load balancing is bit complicated because jobs are mainly tied in actual hardware infrastructure, not in kernel. Two such middleware for Grid are Globus [78] and Condor [79].

(e) Hardware and Firmware:

The bottom layer in Cloud layered architecture is fabric layer i.e. actual physical hardware and switches which are so-called backbone for the Cloud [63]. Users of this layer are organizations with massive IT requirements. Providers sometimes facilitates Hardware as a Service (HaaS). This model helps enterprise clients so that they need not build and maintain large data centers. Services included (but not limited to) in HaaS are servers, desktops, notebooks, infrastructure components, licensing etc. [80].

Some technical challenges still exist to implement HaaS eectively. Eciency in speed in large scale systems is a challenging issue. Remote scriptable bootloaders (for example UBoot [81] is one solution to boot the system remotely and deploy applications hosted in distributed data centers. Another challenges in HaaS are data center management, scheduling power consumption optimizations etc. [63]. In Table 3.1 [63] we provide example of some existing Cloud system and then classied into the layers of Cloud Ontology . 54

Cloud Layers Cloud Application Layer Cloud Software Environment

Example of existing Cloud Solutions Google Apps, Salesforce Customer Relation Management (CRM) Google App Engine, Salesforce Apex System

Computational Resources: Amazon EC2, Enomalism Elastic Cloud Software Infrastructure Cloud Storage: Amazon S3, EMC Storage Manages Service Communication: Microsoft Connected Service Framework (CSF) Software Kernel Grid and Cluster Computing Systems (for example : Globus and Condor) Firmware or Hardware IBM-Morgan Stanleys Computing Sublease, IBM Kittyhawk Project

Table 3.1: Example of existing Cloud Systems w.r.to classication into layers of Cloud Ontology

3.2.2

Cloud Business Model

Cloud computing provides a service-driven business model [82]. In Cloud, hardware and platform resources (which is actually provided as services) are available as per demand. Each layer discussed in the layered architecture can be used as a service to the upper layer. In other words, every layer is considered as consumer to the lower level layer.

Figure 3.5: Cloud computing Business model Cloud services generally grouped into three categories : a) Infrastructure as a Service 55

(IaaS). b) Platform as a Service (PaaS) c) Software as a Service (SaaS). (a) Infrastructure as a Service (IaaS): In IaaS, customer can deploy his own software on the infrastructure. IaaS provides infrastructural resources (for example: servers, storage systems, networking devices, data center space etc. [83]) based upon demand with the benet of Virtual Machines (VM). Organization oers the IaaS know as IaaS provider. Common example of IaaS providers include Amazon EC2 [84], GoGrid [85] and 3Tera [86]. (b) Platform as a Service (PaaS): PaaS provides platform level resources which may include support for operating systems and software development frameworks [82]. The combination of operating systems and software development frameworks (for example LAMP platform - Linux, Apache, MySQL, PHP) ensures manageability and scalability of Cloud environment [83]. Microsoft Windows Azure [87], Google App Engine [88], Force.Com [89] are common example of PaaS providers. (c) Software as a Service (SaaS): SaaS provides on-demand applications through Internet. Single instance of the service (single or multiple softwares) runs on the Cloud and multiple users connected through Cloud can access it. Customers beneted by saving their equipment investment and software licensing cost. On the other hand, providers are beneted because of only single instance of the software (service) needs to be hosted and maintained. SaaS is oered by Google [90], Microsoft [91], Rackspace [92]. Figure 3.5 illustrates typical Cloud business model. Based upon layer architecture of Cloud, PaaS providers run on top of IaaS providers services. But, in current business markets, IaaS and PaaS providers provides services jointly (for example Google and

56

Salesforce) [82]. For that reason PaaS and IaaS providers often considered as Infrastructure providers or Cloud providers [93].

We will cover details of these services on section 3.3.

3.2.3

Cloud Deployment Model

Cloud Deployment Model describes Cloud deployment scenarios available to any typical organization. Deployment model mainly denes [94]: a) External (or Public) Cloud, b) Internal (or Private) Cloud, c) Hybrid (or Integrated) Cloud and d) Community (or Vertical) Cloud.

Other than traditional Cloud solutions, organization can implement Cloud internally, commonly known as Private Cloud. In Private Cloud business organization can provide eective utilization of computing resources, at the same time, security and privacy of data can be ensured. Many analysts suggest that implementing Cloud systems internally inside organization actually defeats the main objective of Cloud [94].

Main focus of traditional Cloud is obtaining computing resources form a network of Cloud service provider based upon demand with a provision of dynamic addition or subtraction of capacity. Implementing internal Cloud means internal capacity. In traditional (public) Cloud, end-users need not pay infrastructure costs once they purchase services form the providers. But, Private Cloud, like internal data centers incur depreciated costs. As a matter of fact, some would argue that, Public Cloud actually use of internal resources through a highly virtualized hardware and application wrapper [94]. Regardless of this debate, These dierent types of Cloud, each with its own advantages and drawbacks are discussed here.

57

(a) External (Public) Cloud:

This Cloud solutions is Provided by independent third-party cloud service providers. Service providers oer their resources as services to all (general public to business organizations). Example of External (Public) Cloud Deployment Model are Amazon, Salesforce, Google and other Cloud service providers. Key attributes [94] of this deployment model are: Services are accessed through web with a self-service user interface, Well documented user guides, APIs and technical support, Service Level Agreements (SLA) between clients and providers, Availibilty of multiple virtual machines with various congurations based upon requirements (which includes conguration of processor, memory, operating system, application server, development environments and so on), Provision of dierent types of Cloud resources: for example, Amazon provide dierent services targeting dierent group of users - Amazon Simple Storage Service (S3), Amazon Simple DB for storage, Amazon Elastic Compute Cloud (EC2) for computation etc. Figure 3.6 shows example of Public Cloud. One of the major Benets of Public Cloud includes no initial investment for infrastructure. But there is a controversy that Public Clouds lack control over data, network and security settings which may hamper eectiveness in many business organizations [82].

58

Figure 3.6: External or Public Cloud (b) Internal (Private) Cloud:

Internal or Private Clouds are mainly designed for single organization. These type of Cloud can be built and manage by organization itself or by external providers. Benets of Private Cloud includes highest degree of control over performance. reliability, security and privacy. But as said earlier, Private Clouds are being criticized due to similarity with traditional proprietary servers or data centers and hence do not provide benets of no up-front capital costs [82]. Figure 3.7 shows example of Public Cloud.

Figure 3.7: Internal or Private Cloud Private vs. Public Cloud Computing: Several distinguish characteristics [95] of a Private Cloud actually diers it from traditional distributed systems.

59

Firstly, Private Cloud diers from Public Clouds that the infrastructure in Private Cloud is solely dedicated to a single business enterprise and which is not shared with others. This infrastructure may include corporate clients, business partners, intranet vendors or any other groups. Secondly, Security credentials are generally strict in Private Cloud Deployment Model. Though Private Cloud is not inheretently more secure than Public Cloud, but the organization that has security issues and risk concerns make adopt tighter security accessories.

(c) Hybrid (Integrated) Cloud:

Combination of Public and Private Cloud model is Hybrid (or Integrated) Cloud. In this type of Deployment model, part of the services runs in Private Cloud while rest of services runs under Public Cloud. Hybrid Deployment Model provide more adaptability which makes it exible than Public or Private models. More generally, Hybrid Clouds provide strong security features and more control over application and data compared to Public Clouds. Besides it is still able to provide scalability and can serve clients on-demand requests. But, the complex part is to determine the optimum partition or splitting boundary of public and private components [82]. Hence, Hybrid Cloud requires Cloud integration. So, often this model is known as Integrated Deployment Model. Cloud integration and interoperability is one of major research challenges in Cloud industry [94]. There are some Cloud interfaces and APIs, Cloud integration and interoperability standards, tools for cross-cloud composition exists to meet the business

60

requirements and need to improvise for optimized performance and meet future demands. Figure 3.8 shows example of Hybrid Cloud.

Figure 3.8: Example of Hybrid Cloud Major attributes [94] of Hybrid Clouds are: A combination of Private (Internal) Cloud and Public (External) Cloud enabled resources. Benets of cost-eectiveness of external third-party Clouds with mitigation of risks by maintaining internal Private Cloud for critical process (and application data). Integration of external and internally provided capabilities which includes integration of vendor proprietary APIs with internal interfaces.

3.3

Cloud Services

In Cloud Business Model in Section 3.2.2 we give a brief of Cloud Services. Now in this Section, we will cover the the services in more detail and describe how they logically connected to each other.

61

3.3.1

Infrastructure as a Service (IaaS)

IaaS provides computing resources such as processing or storage that can be received as a service. IaaS providers typically oer virtualised infrastructure as a service so that end-users need not to buy raw hardware infrastructure. Raw hardware resources, such as compute, storage and network resources, are considered as the fabric layer. Typically by virtualization, hardware level resources are abstracted and encapsulated and exposed to end users through a standardized interface [59] as shown in Figure 3.9.

Figure 3.9: Correlation between Cloud Architecture and Cloud Services IaaS allows to provide resources such as Server space, Network equipment, Memory, CPU cycles, Storage space etc [68]. Figure 3.10 shows an example of IaaS. The infrastructures can be dynamically scaled up or down, based on the application demand of resources.

3.3.2

Platform as a Service (PaaS)

Platforms are an abstraction layer between the software applications (SaaS) and the virtualized infrastructure (IaaS). PaaS are targeted for software developers. Developers can write applications based on specications of a particular platform without going deeper about