Cloud Forensics

14
1 Cloud Forensics Irina Mihai, C˘ at˘ alin Leordeanu, Alecsandru P˘ atras ¸cu Automatic Control and Computers Faculty POLITEHNICA University of Bucharest Email: [email protected], [email protected], [email protected] Abstract In the last years the Cloud environment has become a major attraction not only for developers, but even for ordinary users. Not worrying about where to keep your data or where to run your applications most definitely describes an ideal context in the digital world of today. If we add scalability, flexibility and low costs to that we could just say “Evrika” and enjoy using the Cloud. A thing that should not be forgotten is that such massive and popular services have always become main targets for cybercriminals who plan attacks not necessarily for money or fame, but sometimes out of pure curiosity. This is where Cloud Forensics comes in, providing the necessary means for tracking this attacks and offering security solutions for the still-evolving Cloud. The scope of this paper is to give a general view of the Cloud - when, where and why it appeared - and to present some of its existing implementations together with the challenges in conducting a Cloud Forensics investigation. Several approaches for Cloud Forensics are presented and also an idea for future research in this area. I. I NTRODUCTION There is a very real phenomenon that people are not aware what the software they use is doing behind its nice interface. Sometimes people don’t know if and how a software is using the Internet, furthermore the Cloud. Another phenomenon just as real is that other people are very much aware that others have no idea that they are using the Cloud. With the continuous growth of this new type of environment, who could really keep track of everything? The Cloud impresses not only with its novelty status, but mostly with what is wants and has to offer: flexibility, redundancy, fast data transfers, practically unlimited pool of resources and everything on demand and not at a high price. There hasn’t been yet found a complete definition for this new emerged paradigm, but until now it has been worldwide agreed that the Cloud comes with three types of services - Infrastructure as a Service, Platform as a Service, Software as a Service - and four deployment models - Private Cloud, Public Cloud, Hybrid Cloud, Community Cloud - some of which will be presented in the present paper. There are many companies and institutions that have chosen to implement a Cloud solution. Amazon, Microsoft and Google offer Public Cloud Solutions, part of them free, others paid. There are also projects for developing Private Clouds, each with its own structural and logical architecture. Section 3 presents some of these solutions and even how much they cost. A short comparison of the current prices with those from a few years ago shows that more and more people and even more, enterprises wish to use the Cloud. This popularity has its advantages and disadvantages. While the advantages have been mentioned earlier, one of the main issues is that such a popular service draws the attention of attackers. The bigger and complex a system is, the higher the probability of finding a breach, exploiting it and getting away with it. This is why Cloud Forensics is a must: to develop the tools for detecting malicious behavior and to try and suggest improvements for avoiding data loss or tampering and confidentiality violation. All the nice things that the Cloud provides may come as a burden into a forensics investigation. We don’t just have to deal with technical aspects, but also with legal and organizational aspects. It would not be of much use to have all the necessary tools for data collection and analysis but not gaining access to the physical machine due to legal aspects. Having Cloud sites on so many different locations, with differents laws about security and confidentiality, it should become a must for Forensics aspects to be specified, a priori to any actual resource rent or usage, between the Cloud Provider and the customer. There is also a need for standardization in the Forensics world, but all in due time. These aspects are presented in section 4, leaving section 5 for a short description of a future framework for Cloud Forensics.

description

Cloud Forensics

Transcript of Cloud Forensics

  • 1Cloud ForensicsIrina Mihai, Catalin Leordeanu, Alecsandru Patrascu

    Automatic Control and Computers FacultyPOLITEHNICA University of Bucharest

    Email: [email protected], [email protected], [email protected]

    Abstract

    In the last years the Cloud environment has become a major attraction not only for developers, but even forordinary users. Not worrying about where to keep your data or where to run your applications most definitelydescribes an ideal context in the digital world of today. If we add scalability, flexibility and low costs to that wecould just say Evrika and enjoy using the Cloud. A thing that should not be forgotten is that such massive andpopular services have always become main targets for cybercriminals who plan attacks not necessarily for money orfame, but sometimes out of pure curiosity. This is where Cloud Forensics comes in, providing the necessary meansfor tracking this attacks and offering security solutions for the still-evolving Cloud.

    The scope of this paper is to give a general view of the Cloud - when, where and why it appeared - and to presentsome of its existing implementations together with the challenges in conducting a Cloud Forensics investigation.Several approaches for Cloud Forensics are presented and also an idea for future research in this area.

    I. INTRODUCTIONThere is a very real phenomenon that people are not aware what the software they use is doing behind its nice

    interface. Sometimes people dont know if and how a software is using the Internet, furthermore the Cloud. Anotherphenomenon just as real is that other people are very much aware that others have no idea that they are using theCloud. With the continuous growth of this new type of environment, who could really keep track of everything?

    The Cloud impresses not only with its novelty status, but mostly with what is wants and has to offer: flexibility,redundancy, fast data transfers, practically unlimited pool of resources and everything on demand and not at ahigh price. There hasnt been yet found a complete definition for this new emerged paradigm, but until now it hasbeen worldwide agreed that the Cloud comes with three types of services - Infrastructure as a Service, Platformas a Service, Software as a Service - and four deployment models - Private Cloud, Public Cloud, Hybrid Cloud,Community Cloud - some of which will be presented in the present paper.

    There are many companies and institutions that have chosen to implement a Cloud solution. Amazon, Microsoftand Google offer Public Cloud Solutions, part of them free, others paid. There are also projects for developingPrivate Clouds, each with its own structural and logical architecture. Section 3 presents some of these solutionsand even how much they cost. A short comparison of the current prices with those from a few years ago showsthat more and more people and even more, enterprises wish to use the Cloud.

    This popularity has its advantages and disadvantages. While the advantages have been mentioned earlier, one ofthe main issues is that such a popular service draws the attention of attackers. The bigger and complex a system is,the higher the probability of finding a breach, exploiting it and getting away with it. This is why Cloud Forensicsis a must: to develop the tools for detecting malicious behavior and to try and suggest improvements for avoidingdata loss or tampering and confidentiality violation.

    All the nice things that the Cloud provides may come as a burden into a forensics investigation. We dont just haveto deal with technical aspects, but also with legal and organizational aspects. It would not be of much use to haveall the necessary tools for data collection and analysis but not gaining access to the physical machine due to legalaspects. Having Cloud sites on so many different locations, with differents laws about security and confidentiality,it should become a must for Forensics aspects to be specified, a priori to any actual resource rent or usage, betweenthe Cloud Provider and the customer. There is also a need for standardization in the Forensics world, but all in duetime. These aspects are presented in section 4, leaving section 5 for a short description of a future framework forCloud Forensics.

  • 2II. CLOUD SYSTEMS

    NIST1 probably gave the most popular definition for the Cloud [23]:

    Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool ofconfigurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidlyprovisioned and released with minimal management effort or service provider interaction.

    Therefore, cloud computing is the model that should provide not only fast access to shared data, but also extremelylow resource management; its the latest distributed and parallel paradigm that guarantees to offer reliable servicesimplemented on modern servers, on top of virtualized resources [13].

    An older model is the Grid whose infrastructure was first thought of in the 90s when the ideas of on-demandcomputing and data sharing among dynamic resources have emerged [14]. One of the most popular definitions ofthe Grid was given by Buyya in 2002, at the Grid Planet Conference:

    A Grid is a type of parallel and distributed system that enables the sharing, selection and aggregation of geo-graphically distributed autonomous resources dynamically at runtime depending on their availability, capability,performance, cost and users quality-of-service requirement.

    Nowadays, when both models have reached a certain level of popularity and maturity, a question that could emergeis: why isnt the Grid enough and why would the Cloud be? For answering this question, lets make a rather quickanalysis of the two.

    A. Grid computing

    One of the most popular definitions of the Grid was given by Buyya [13], at the Grid Planet Conference:

    A Grid is a type of parallel and distributed system that enables the sharing, selection and aggregation of geo-graphically distributed autonomous resources dynamically at runtime depending on their availability, capability,performance, cost and users quality-of-service requirement

    The main purpose of the Grid is to give a solution to those problems that cant be solved on a singular machine.Thus, in a Grid we may find not only personal computers, but also data centers, servers, clusters, supercomputers,all configured to share organization information and communicate in order to find resolution for an extremelycomputationally expensive task. One of the most important and new aspects of the Grid is that all these resourcesmay as well be placed in different geographical areas and still function properly.

    There are a number of pros and cons in using the Grid.

    Advantages: One of the advantages is that an application may benefit not only by storage resources[20], but alsoby services running on the Grids machines. Considering the fact that a Grid may also contain specialized devices,there is no need for an application to be over a certain level of generality for being executed. Another advantage isthe redundancy given by the distributed configuration of the Grid. If one site suddenly fails, the user may not evenbe aware of this fact since his application can be easily moved on another site. The Grid has also found a solutionfor the computationally expensive tasks: its scheduler is designed to efficiently allocate tasks to those machinesthat report a low utilization level, ensuring the users quality of service requirements.

    Disadvantages: Considering that the resources from a Grid are not physically placed together, there have beenconflicting policies in the process of data sharing between domains. Another disputed aspect of the Grid was theprovided security: the virtualization method covers the data and the resources, giving the impression of a one largeset of resources. There have been many challenges in ensuring the needed security for accessing these resources,such as dynamic services, multiple security mechanisms, dynamic apparition of multiple trust domains [37] that

    1National Institute of Standards and Technology

  • 3Figure 1: Grid concept [18].

    often made the Grid an unwanted solution for large scale applications.

    The heterogeneity of the Grid and its other limitations made it clear that there is a need for something more flexible.This need led to the apparition of Cloud.

    B. Cloud computing

    R. Buyya gave one of the first complete definitions of Cloud:

    A Cloud is a type of parallel and distributed system consisting of a collection of inter-connected and virtualizedcomputers that are dynamically provisioned and presented as one or more unified computing resources(s) based onservice-level agreements established through negotiation between the service provider and consumers. [13], justas shown in Figure 2.

    Figure 2: Cloud concept.

    From that definition it can be concluded that the Cloud is able to offer the same services as the Grid and evenmore. There are mainly three new hardware aspects that the Cloud brought [12]:

    the impression of limitless resources, at request, thus excluding the users necessity to plan in advance fortheir projects needs

    the invalidation of a prior obligation of the Cloud users, thus offering the possibility to ask for more resourcesonly if and when they are needed

    the capability to pay per use, thus giving the possibility to rent resources for as long as the users needsthem, even for short periods, and to free them at any time.

    Luis M. Vaquero et al. [36] gave a more complex definition of the Cloud, one that they believe sums up manyother definitions and the main aspects of what Cloud is today:

  • 4Clouds are a large pool of easily usable and accessible virtualized resources (such as hardware, developmentplatforms and/or services). These resources can be dynamically reconfigured to adjust to a variable load (scale),allowing also for an optimum resource utilization. This pool of resources is typically exploited by a pay-per-usemodel in which guarantees are offered by the Infrastructure Provider by means of customized SLAs2.

    They have repeatedly stated that the Cloud still does not have a final and valid definition, given the fact that it isa computing solution not yet finalized for which there are many ongoing usages and requirements for which rulesand policies are to be defined.

    C. Cloud computing: features and challenges

    The above definitions are presenting the idea of Cloud Computing and what it should offer, but when going fromtheory to actual implementation, the challenges are considerable.

    In [12], Michael Armbrust et al. present a series of said challenges. Some of these include: Service availability. The Cloud is expected to be permanently available and to never fail. Using Cloud

    Computing services from multiple providers is a solution to this issue. Data Lock-In. Due to the lack of standardization of Cloud architecture, the applications and data cannot be

    easily moved from one Cloud to another. When a Cloud provider is having problems that require his pricesto go up, the Cloud user is forced to pay the increased prices in order not to lose outside access to his data.Obviously, the solution to this situation stands standardizing the APIs for the Cloud platforms.

    Security. There is a popular belief that Cloud storage is easily accessible by anyone. However, making theCloud secure is not so difficult. Encrypting the data before uploading it to the Cloud and changing the encryptionkey at various periods of time would offer all the needed confidentiality.

    Data Transfer. Nowadays, the amount of data on which applications operate is constantly increasing. Thecost of sending the data via a courier depends only on the weight of the hard drives while sending the samedata via Internet costs more for each sent megabyte. There is a point in the amount of transferred data when itbecomes more advantageous to use the courier rather than the Internet. Sending physical disks, coupled withreducing the cost of inter-cluster transfers - once the data is in the Cloud - aims at making the Cloud moreaffordable.

    Scalability. The system has the capability of permanently providing pay-per-use resources and even logswithout affecting the overall performance and with sustainance for indexing.

    III. CLOUD COMPUTING INFRASTRUCTURE

    All the Cloud services presented up until now are classified in three categories [23]: Software as a Service (SaaS). The Cloud provider offers applications that the user can access and use online. Platform as a Service (PaaS). The user is given the possibility to upload his own applications on a Cloud

    site and use it from there. In this case, one must pay attention to the compatibility of the used programminglanguage with the platform.

    Infrastructure as a Service (IaaS). The consumer has now access to power supplies and storage and otherraw resources. This type of service is the only one where the user can control the operating systems, canmanage the storage and even configure elements of the network.

    Figure 3 shows what these services consist of, how they interact and who offers them.The NIST Definition [23] of Cloud Computing also presents the deployment models of Cloud Computing: Public Cloud. In this model, the location of the site belongs to the Cloud provider. Here, more organizations

    can own and share the services provided by the Cloud, but their data and applications are kept separated. Thereare several public Clouds in continuous development: Amazon EC2, Google AppEngine, Microsoft Azure.

    Private Cloud. This model can also be considered internal cloud or enterprise cloud [11]. The infrastructureof the Cloud is specifically designed to provide services for a single organization which usually owns the

    2Service Level Agreement: official written agreement between a Cloud System Provider and a customer that contains what services willbe offered by the provider.

  • 5location of the site. Multiple solutions for setting up a private Cloud are used by organizations: VMWare,VirtualBox, OpenStack.

    Hybrid Cloud. As the name suggests, this model implies the collaboration of two or more different Cloudinfrastructures which remain separated, but can share data and even applications based on standardized policies.There is a number of reasons for choosing this model: the ease of moving applications between different sites,the ability to rapidly access the data from another cloud, the possibility to combine the resources provided byseveral clouds and use them as a whole.

    Figure 3: Cloud services [17].

    Each solution given for the Cloud models comes with its own approach and ways for making the Cloud whatthe users expect it to be. In the following section we will summarily present these approaches and what were theobtained results.

    A. Public infrastructure

    In this section we will focus on the Cloud solutions offered by Amazon Elastic Compute Cloud (EC2), GoogleApp Engine and Microsoft Azure.

    1) Amazon Elastic Compute Cloud (EC2): EC2 offers an IaaS type of service. As stated in [25], AmazonElastic Compute Cloud (Amazon EC2) provides scalable computing capacity in the Amazon Web Services (AWS)cloud. Using Amazon EC2 eliminates your need to invest in hardware up front, so you can develop and deployapplications faster. You can use Amazon EC2 to launch as many or as few virtual servers as you need, configuresecurity and networking, and manage storage.

    What EC2 actually offers is the support for running multiple applications, Linux and Windows based. On the sameset of physical resources, several virtualized machines are mapped and given to users. For accomplishing this, EC2uses the Xen open-source middleware.

    When an user wants to use the EC2 resources, the first step is to send a request for one of the available machines,here called instances. There are several types of instances the user can choose from, some of them presented inTable I, the others found in [8].

    The prices for data transfer in and out of Amazon EC2 depend on the source and destination of the transferand vary between $0 and $0.085 per GB. There are also instances which offer an optimized storage, like i2.xlarge,i2.2xlarge, i2.4xlarge and i2.8xlarge and whose costs are between $0.853 and $6.820 per hour.

    After choosing the instance, the user has to specify the VM3 he wants to deploy on ECs physical machine. Once thedeployment is completed, the instance starts its booting process and after that it can be used as a normal computer,usually via ssh. There are two statuses that the physical resource has during the process: running, during the boot,

    3Virtual Machine

  • 6Instance type ECU (Nr of cores) RAM (GB) Architecture (bit) Disk (GB) Linux cost ($/hour) Windows cost ($/hour)m1.small 1(1) 1.7 32/64 160 0.044 0.075m1.large 4(2) 7.5 64 840 0.175 0.299m1.xlarge 8(4) 15.0 64 1680 0.350 0.598c1.medium 5(2) 1.7 32/64 350 0.130 0.210c1.xlarge 20(8) 7.0 64 1680 0.520 0.840

    Table I: Amazon EC2 type of instances

    and installed, during the actual usage (after boot)[1]. The costs presented in TableI are calculated for the time aresource is marked as installed.

    One user can run up to 20 instances simultaneously. The elastic term refers to the fact that user can completelycontrol his infrastructure by opening and closing instances in the 20 allowed limit.

    2) Google App Engine (GAE): Google developed another technique (different than server virtualization used byAmazon EC2) technique-specific sandbox [30]. Google states that Google App Engine is a Platform as a Service(PaaS) offering that lets you build and run applications on Googles infrastructure. App Engine applications areeasy to build, easy to maintain, and easy to scale as your traffic and data storage needs change. With App Engine,there are no servers for you to maintain. You simply upload your application and its ready to go.[6]

    The initial scope of GAE was to enlarge the Web and to make it even more appealing. Googles infrastructureoffers support for applications written in a variety of programming languages[6]: Java, Python, PHP, Go.

    Given the fact that App Engine is mostly destined to Web applications, there is a rather large number of requestsand replies which are quantified. These requests and replies are CPU intensive and thats why they are rationed.For example, if an application is extremely popular and receives thousands of request per day some of them will beprocessed freely, within the free quota and the others will be charged. This is part of Googles pricing philosophy:use for free within a certain amount of resources and pay for more.

    The term sandbox means that an application is allowed to do as many things as the Google Cloud provider allowsit to do. The Python developed environment is permanently checking all the actions and stops those potentiallyunsafe. If one application is detected with this kind of unsafe operations, it is automatically shut down to ensure thatno other applications are harmed. The user cannot install whatever APIs he wants (except for the supported ones)because everything is carefully monitored. In addition, the user is not aware of the requests that his applicationreceives and even more, of where and how his application is being ran. Everything is taken care of by the GAE. Tosum up, GAE has a copy of your application, somewhere on one of its data centers. If your application is not at allpopular, it may not even be ran, but if it is ran, than it may be ran on several sites. When the application receives arequest, GAE identifies it as being for you and knows how to send it to the destination. The same thing happens forthe reply. The only concern of the developer is to efficiently use the resources for managing the received requests.

    Google App Engine was integrated in a broader project: Google Cloud. Google Cloud offers products for computing,storage, networking, Big Data, API translation and prediction and deployment management. These services arepresented in the below image.

    Figure 4: Google Cloud Platform [3].

    Besides from the PaaS solution, Google App Engine, Google also developed an IaaS solution, Compute Engine

  • 7which comes with a large number of possible configurations for the virtual machines. The number of availablestandard operating system images is quite diverse: CentOS6, CentOS7, Debian 7 Wheezy, Debian 7 WheezyBackports, Red Hat Enterprise Linux, SUSE, Ubuntu, Windows Server.

    The first step in using Compute Engine is to activate this service from Google Developers Console4. After that anvirtual machine can easily create a virtual machine with the desired characteristics: machine type, image type, disktype. It is also advisable to set up a firewall for enabling a first level of filtering from the Internet. The created VMcan now be accessed via ssh and used for development.

    Google Compute Engine offers two ways for managing the projects: a command-line tool and a Compute EngineConsole which comes as a graphical user interface [7].

    3) Microsoft Azure: In [12] it is stated that Azure is intermediate between complete application frameworkslike AppEngine on the one hand, and hardware virtual machines like EC2 on the other.

    Through the applications written using .NET libraries, Azure offers now support not only for Windows Serversbut also for Linux based virtual machines. Users can write their programs in .NET, PHP, Node.js, Java, Ruby andPython. The Microsoft Azure platform is formed of three components[15]:

    Windows Azure provides the platform for running Windows based applications and for managing the neededdata.

    SQL Azure is the main service that manages the data operations. Windows Azure platform AppFabric assures that the applications and the data in the Cloud are able to

    communicate and share information.From the infrastructure point of view, Azure offers not only the possibility to create, deploy and work on your

    virtual machines, but also to use pre-configured environments and to get directly to business. The separation ofinstances is made through the Windows Azure Hypervisor (WAH).

    Windows Azure classifies its services into four categories [35]: Compute offers computing power and includes:

    Virtual Machines Web Sites Cloud Service Mobile Services

    Network offers a way for the users to be provided with Windows Azure applications and contains: Virtual Network Traffic Manager

    Data offers the means for data management such as collection, analysis, storage and assumes: Data Management Business Analytics HDInsight Cache Backup Recovery Manager

    App offers support for elements as security, performance improvement and includes: Media Services Messaging Notification Hubs Active Directory Multifactor Authentication

    Table II is a summary of the Microsoft Azure instances and their prices found in [2]. The machines are paidper minute and there are two tiers of services: Base Tier, for not so demanding applications and Standard Tier for

    4GUI that provides support for managing the services Google offers

  • 8Instance (Standard tier) Cores RAM (GB) Disk size (GB) Price ($/hour)A0 1 0.75 20 0.02A1 1 1.75 70 0.09A2 2 3.5 135 0.18A3 4 7 285 0.36A4 8 14 605 0.72A5 2 14 135 0.33A6 4 28 285 0.66A7 8 56 605 1.32

    Table II: Windows Azure Standard tier

    those applications that require more memory, more CPU, faster networking operations.

    B. Private infrastructure

    In this section we will present three of the open-source softwares for private Cloud: OpenStack, Eucalyptus andOpenNebula.

    1) OpenStack: The OpenStack Project, initially developed by NASA[29] offers an IaaS model which not so longago had only seven components[5], but now has developed three more. The OpenStack architecture contains [4]:

    Dashboard (Horizon) comes with a Web GUI for the others OpenStack services. Compute (Nova) offers the necessary software for working with virtual machines. It is similar to Amazon

    EC2, assuring scalability and redundancy. Network (Neutron) ensures connectivity between machines running OpenStack software. Object Store (Swift) offers the software for managing the storage of data to the order of petabytes for long

    periods of time. Block Storage (Cinder) offers, as the name suggests, block storages for guest only virtual machines. Image Service (Glance) provides services for virtual disk images. Identity (Keystone) manages the security for the OpenStack services. Telemetry (Ceilometer) is responsible for billing, metering, rating, autoscaling. Orchestration (Heat) conducts the interactions between different Cloud applications. Database Service (Trove) offers support for relational and non-relational databases.The interaction of the components presented above are illustrated in Figure 5.

    Figure 5: OpenStack architecture [5].

    If an user wants to use OpenStack he will to that either through the Dashboard service, either through the APIsevery service provides. The Identity services ensures the authentication and after that, all the other services are

  • 9accessible.

    The project was developed in Python and runs with an Apache2 license. The provided GUI offers the user theliberty to create not only virtual machines with default values of CPU and RAM, but to set those numbers [32].Given the fact that the OpenStack Project aims to be able to support a considerable large infrastructure, it usesmore hypervisors for creating and running the virtual machines: Xen, KVM, HyperV, Qemu[19].

    2) Eucalyptus: Unlike OpenStack which supports only private infrastructures, Eucalyptus also provides solutionsfor hybrid infrastructures. It is written in several programming languages: Java, C and Python.

    Eucalyptus was designed to work under an hierarchical architecture, using an emulation of Amazon EC2s SOAP.This way, the user can create, access, manage and terminate virtual instances just like on Amazon EC2 presentedin Section 3.1.1. In the beginning, the virtual machines were running only on top of the Xen hypervisor, but knowit also offers support for KVM/QEMU and VMware.

    Figure 6 shows the hierarchical architecture of Eucalyptus and its main four components Node Controller are incharge of the virtual machines from the physical machine on which they are installed. Cluster Controller gathersinformation from a number of node controllers and manages VM scheduling and instances. Cloud Controller isthe core of the architecture; it takes care of the high-level aspects of the system giving commands to the ClusterControllers about scheduling and resource allocation. Storage Controller (Walrus) includes Walrus which is a storageservice[19]. It provides storage for the VM and it can be used as a HTTP put/get solution

    Figure 6: Eucalyptus architecture [9].

    3) OpenNebula: OpenNebula is a Linux based for mostly private but also public Cloud infrastructures. Thevirtualization methods used are Xen, VMware and KVM. As stated in [34], the solution is not environmentdependent, offering flexibility and modularity through its three components: the OpenNebula core, the CapacityManager and Virtualizer Access Drivers.

    The architecture upon OpenNebula was developed is more classical, offering a front-end part and a series of clustersfor running the virtual machines[39]. The programming languages used for development are Ruby, C++ and Java.

    Lately, a decrease in the popularity of OpenNebula has been seen, while OpenStack and Eucalyptus are beingpreferred [39].

  • 10

    IV. FORENSICS

    A. Introduction to Cloud Forensics

    Cloud Forensics does not aim to secure the systems, but to detect the infiltrations and to offer the authorities away to track the source of the attacks.

    The notion of Cloud Forensics has been addressed for the first time in 2009 at a large scale [38]. While thecompanies continue to offload their IT applications infrastructure to the Cloud, criminals who target these systemsare more and more attracted to them. No matter how secure a system is, there always is a breach to be found andCloud systems are no exception. As the rewards that can be acquired by a criminal are better the larger the systemis, attacks on these systems can not be avoided.

    Cloud Forensics does not aim to secure the systems, but to detect the infiltrations and to offer the authorities a wayto track the source of the attacks.

    The tools used by Cloud Forensics are different than those used for studying standard computer systems, as thelatter ones are being insufficient when applied to large scale.

    In [31], Ruan et al. presented Cloud Forensics as a cross-discipline between Cloud computing and digital forensics,to which NIST gives the following definitions: Cloud computing is a model for enabling ubiquitous, convenient,on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage,applications, and services) that can be rapidly provisioned and released with minimal management effort or serviceprovider interaction and digital forensics is considered the application of science to the identification, collection,examination, and analysis of data while preserving the integrity of the information and maintaining a strict chainof custody for the data.

    Also, they have divided the Cloud Forensics problem into three main categories: organizational, legal and technical.

    The legal aspect covers the multi-jurisdiction and multi-tenancy issue which demands for the Cloud Forensicsoperations not to come in contradiction to any laws and not to tamper the confidentiality of others. There are alsothe Service Level Agreements (SLAs) which contain a set of rules and demands that govern the provider-customerinteraction at the service level. The SLAs must refer to the conditions that the Cloud Forensics investigators mustrespect during their work.

    The organization aspect establishes the personnel, the collaboration permitted and the external necessary in-volvement for the forensics action to take place efficiently and successfully. The following roles have been foundnecessary in conducting these activities: investigators, IT professionals, incident handlers, legal advisors and externalassistance.

    The technical aspect is the one where the action really takes place. For this kind of Cloud investigations designatedtools must be used with respect to the legal aspect presented above. There are several characteristics that are amust for the forensics tools: the data collection must be done carefully, not to violate the space of several othertenants and to ensure the data integrity. Given the dynamic architecture of the Cloud, the forensics tools must beof several types: elastic, static and live[31]. Another definite need for these tools are the capability to segregate thecollected data. Evidence that does not belong to the person whose Cloud space has been hacked will clearly put theinvestigation on a wrong track. Since the Cloud is a mostly virtualized system, the tools for examining intrusionsshould be able to analyze such an environment.

    B. Forensics Challenges

    As it can be seen in the former chapter, the rules and regulations for conducting a Cloud Forensics investigationare not few. Adding the complexity architecture of the Cloud systems and its extremely large usage, there havebeen several challenges that Cloud Forensics has met, mapped to the technical aspects that the investigation toolsmust provide.[31]

  • 11

    Data collection is the first step in the Cloud Forensics process. For collecting the data, one must get access to it.It is clear that depending on the Cloud deployment model, there are several degrees of difficulty in doing that. Ifwe are dealing with an IaaS model, the access to the data is relatively easy to obtain. This is not the case for theSaaS model where the customer has no idea where his application is actually run or where his data is kept. Thecompleteness of SLAs is here put to doubt since the forensics aspects are not enough elaborated. For example, thecustomers dont have access to log files such as IP logs and recent data from the Cloud or former virtual machines.

    The biggest challenge for elastic, static and live forensics appears to be synchronization [31]. Considering theextremely high degree of availability that the Cloud offers and the number of devices - fix and mobile - that canaccess it, synchronizing logs from different physical machines, placed in geographically different locations becomesan issue. Another issue is the unification and conversion of logs, mostly because of their formats which come in avery large number.

    Through the present paper it has many times come into discussion how the instances that run on the samephysical machine are segregated. Usually, this is done by the hypervisor which also plays an important role inthe Forensics world. Maintaining the instances separated, keeping logs of what happens underneath the hypervisor(shared resources) and not going into a Cloud neighbors space during an investigation is crucial in offering a validresult in the end. Another issue appears here if the data kept on the Cloud has been encrypted prior to upload andif that data becomes subject to forensics. The keys can be obtained only after agreements between the customer,the Cloud service provider and the law representants.

    If talking about virtualized environments, the hypervisor represents the main element. The hypervisor takes careof how a virtual machines runs, so its of no surprise that the attackers would want to hack it. If the hypervisor istempered, than all the virtual machines on top of it are compromised and thus, all the data kept on that physicalmachine. Unfortunately, until now there hasnt been a set of policies to be respected by the hypervisor developersin order to make hypervisor forensics easier.

    The challenges are not only of technical level, but also legislative and human. The SLAs are usually incomplete,lacking conditions for the forensics situations, the countries where Cloud sites reside have different laws regardingaccess to data and confidentiality and even the personnel may not always have the required experience to deal witha complex investigation.

    Although these presented difficulties are permanent, there are several direction for diminishing the Cloud Forensicschallenge.

    C. Forensics directions

    Besides the increased difficulties that the Cloud design broughts in the forensics actions, there are also char-acteristics that help in their progress. Given the large usage of Cloud services, any implementation made here ischeaper, forensics included. Another important aspect is that data deletion is probably never complete, thanks tothe provided redundancy, so data recovery might not be so hard to do. Also, the fact that virtual machines can becloned on demand gives the opportunity to conduct a parallel analysis on those machines. The pay-per-use characterof the Cloud has its benefits here since logs can also be generated and stored on demand, giving a broader viewover the evolution of a certain instance.

    There are presented several aspects in [26] that the Cloud should consider in the process of becoming more secure: Information security. It is recommended for all data to be encrypted, no matter where its stored. Also,

    tracking who has access to information, knowing which machine uses which data, monitoring the operationson data would be a considerable steps forward in making the Cloud a more safe environment.

    Trust management in remote servers. Third parties responsible of data and security audit should be involvedin the Cloud usage process if more companies are to use this service.

    Information privacy. If data is to be encrypted when uploaded on the Cloud, then new search and indexingmechanisms are needed, which dont need decryption for returning a valid result. Homomorphic encryption

  • 12

    and PIR5 come to support even more the transition towards fully encrypted data within the Cloud.Rafael Marty gives a more complex view of what logging means in Cloud Computing[22]. Aside from the

    fact that logs are kept on multiple different machines, they are only available for relatively short periods of time.Moreover, every tier of the Cloud system generates logs: the operating system, the network services and almost allthe running applications. In addition to all these, not all users have access to the logs; certain users have accessto certain logs and only for a while. Once the needed logs are gathered, processing and analyzing them is madeheavier by the excessive number of formats they come in. There is at least one rule that all logs should abide: theyall must answer the questions when?, why?, who?, what?. Marty also presents the main steps for settinga log management system: every infrastructure and application must first enable logging, then ensure the transportof the logs and in the end provide the mechanisms for processing the logs.

    A framework that follows these steps is presented in [28]. The implemented architecture contains five layers:management layer, virtualization layer, storage layer, a layer for data analysis and one for result centralization. Allthese layers are represented by jobs in a distributed environment. Figure 7 presents these five layers and the waythey interact. For ensuring a minimum exposure to violation, all the files involved in the logging process are beinghashed.

    Figure 7: Cloud Forensics Logging Framework [28].

    A forensics framework for Cloud Computing is presented in [27]. The proposed architecture is shown in Figure 8.There are seven constituent modules of the application:

    the GUI offers a web interface where the user can choose the number of instances, the software to be usedand can establish for how long he wants to use those instances.

    the Frontend module filters the requests granting access to the User Manager only for those which areaccurate.

    the User Manager module has role of authenticator and lease6 validator. the Lease Manager modules handles the leases and creates the needed jobs after analyzing them. the Scheduler decides on which physical machine to run a certain lease by calculating an average number of

    instances for each node. the Hypervisor Manager has no other role than to manage the hypervisors from the system the Monitor module keeps an eye over the entire system and decides the number of needed virtual machines

    for every lease. the Database Layer offers support for keeping the information needed by the Lease Manager, the Scheduler

    and the Hypervisor Manager.

    5Private information retrieval6renting contract existing between the user that requests certain resources and the system that offers them[27]

  • 13

    Figure 8: System Architecture [27].

    The implementation of this framework showed that enabling forensics modules does not necessarily bring aconsiderable overhead, in this case reaching a maximum of 8% of the total load.

    V. CONCLUSION AND FUTURE WORK

    The scope of the present paper was not only to analyze the Cloud and the tendencies in Cloud Forensics, but toalso open a way towards a new Cloud Forensics solution.

    As a virtualization solution, besides the hypervisors, there is a new concept that can offer the same environmentas a virtual machine, but with less overhead: containers. LXC - Linux Containers - represent a set of tools whichallow control over Linux kernel components. It is a free software written in Python 2, Go, Ruby and Haskell. Therehas been only one LXC release so far, 1.0 in February 2015 which will have support until 2019.

    We propose a Cloud Forensics framework composed of two main elements: containers based audit and intrusiondetection set off by policy violation.

    An important conclusion to be drawn from this paper is that here is never enough: never enough memory, neverenough space, never enough security. What we tried to emphasize was the complete dynamic and fast evolution ofCloud Computing and thus, the need for security and audit tools for evolve just as fast.

    Studying the past, the present and the state of the art for both Cloud Computing and Cloud Forensics brought theidea of a new Forensics framework based on a current and stringent need and brand new development tools.

    REFERENCES

    [1] Amazon Elastic Compute Cloud User Guide for Linux APIVersion 2014-10-01.[2] http://azure.microsoft.com/en-us/pricing/details/virtual-machines/#windows. Technical report.[3] http://cloudacademy.com/blog/google-cloud-platform-new-announcements-and-features/. Technical report.[4] http://docs.openstack.org/admin-guide-cloud/content/ch getting-started-with-openstack.html. Technical report.[5] http://ken.pepple.info/. Technical report.[6] https://cloud.google.com/appengine/docs/whatisgoogleappengine. Technical report.[7] https://cloud.google.com/compute/docs/. Technical report.[8] http://www.ec2instances.info/. Technical report.[9] http://www.institut-numerique.org/summary-51c0279d01413. Technical report.

    [10] 12th International Symposium on High-Performance Distributed Computing (HPDC-12 2003), 22-24 June 2003, Seattle, WA, USA.IEEE Computer Society, 2003.

  • 14

    [11] Public or Private Cloud: The Choice is Yours. Technical report, Aerohive NETWORKS, 2013.[12] Michael Armbrust, Armando Fox, Rean Griffith, Anthony D Joseph, Randy Katz, Andy Konwinski, Gunho Lee, David Patterson, Ariel

    Rabkin, Ion Stoica, et al. A view of cloud computing. Communications of the ACM, 53(4):5058, 2010.[13] Rajkumar Buyya, Chee Shin Yeo, and Srikumar Venugopal. Market-oriented cloud computing: Vision, hype, and reality for delivering

    it services as computing utilities. In High Performance Computing and Communications, 2008. HPCC08. 10th IEEE InternationalConference on, pages 513. Ieee, 2008.

    [14] Ian Foster, Carl Kesselman, and Steven Tuecke. The anatomy of the grid: Enabling scalable virtual organizations. International journalof high performance computing applications, 15(3):200222, 2001.

    [15] Jianya Gong, Peng Yue, and Hongxiu Zhou. Geoprocessing in the microsoft cloud computing platform-azure. In Proceedings the JointSymposium of ISPRS Technical Commission IV & AutoCarto, page 6. Citeseer, 2010.

    [16] Eugene Gorelik. Cloud computing models. PhD thesis, Massachusetts Institute of Technology, 2013.[17] CN Hofer and G Karagiannis. Cloud computing services: taxonomy and comparison. Journal of Internet Services and Applications,

    2(2):8194, 2011.[18] Bart Jacob, Michael Brown, Kentaro Fukui, Nihar Trivedi, et al. Introduction to grid computing.[19] Srivatsan Jagannathan. Comparison and evaluation of open-source cloud management software. 2012.[20] Kiranjot Kaur and Anjandeep Kaur Rai. A comparative analysis: Grid, cluster and cloud computing.[21] Jerome Lauret, Matthew Walker, Sebastien Goasguen, and Levente Hajdu. From grid to cloud, the star experience, 2010.[22] Raffael Marty. Cloud application logging for forensics. In Proceedings of the 2011 ACM Symposium on Applied Computing, pages

    178184. ACM, 2011.[23] Peter Mell and Tim Grance. The NIST definition of cloud computing. 2011.[24] Daniel Nurmi, Richard Wolski, Chris Grzegorczyk, Graziano Obertelli, Sunil Soman, Lamia Youseff, and Dmitrii Zagorodnov. The

    eucalyptus open-source cloud-computing system. In Cluster Computing and the Grid, 2009. CCGRID09. 9th IEEE/ACM InternationalSymposium on, pages 124131. IEEE, 2009.

    [25] Simon Ostermann, Alexandria Iosup, Nezih Yigitbasi, Radu Prodan, Thomas Fahringer, and Dick Epema. A performance analysis ofec2 cloud computing services for scientific computing. In Cloud computing, pages 115131. Springer, 2010.

    [26] Alecsandru Patrascu, Diana Maimut, and Emil Simion. New directions in cloud computing. a security perspective. In Communications(COMM), 2012 9th International Conference on, pages 289292. IEEE, 2012.

    [27] Alecsandru Patrascu and Victor Valeriu Patriciu. Implementation of a cloud computing framework for cloud forensics.[28] Alecsandru Patrascu and Victor-Valeriu Patriciu. Logging framework for cloud computing forensic environments. In Communications

    (COMM), 2014 10th International Conference on, pages 14. IEEE, 2014.[29] Ken Pepple. Deploying openstack. OReilly Media, Inc., 2011.[30] Ling Qian, Zhiguo Luo, Yujian Du, and Leitao Guo. Cloud computing: An overview. In Cloud Computing, pages 626631. Springer,

    2009.[31] Keyun Ruan, Prof.Joe Carthy, Prof.Tahar Kechadi, and Mark Crosbie. Cloud forensics: An overview.[32] Omar Sefraoui, Mohammed Aissaoui, and Mohsine Eleuldj. Openstack: toward an open-source solution for cloud computing.

    International Journal of Computer Applications, 55(3):3842, 2012.[33] Charles Severance. Using Google App Engine. OReilly Media, Inc., 2009.[34] Borja Sotomayor, Ruben Santiago Montero, Ignacio Martn Llorente, and Ian Foster. Capacity leasing in cloud systems using the

    opennebula engine. In Workshop on Cloud Computing and its Applications, volume 3, 2008.[35] Mitch Tulloch. Introducing Windows Azure for IT Professionals. Microsoft Press, 2013.[36] Luis M Vaquero, Luis Rodero-Merino, Juan Caceres, and Maik Lindner. A break in the clouds: towards a cloud definition. ACM

    SIGCOMM Computer Communication Review, 39(1):5055, 2008.[37] Von Welch, Frank Siebenlist, Ian T. Foster, John Bresnahan, Karl Czajkowski, Jarek Gawor, Carl Kesselman, Sam Meder, Laura

    Pearlman, and Steven Tuecke. Security for grid services. In 12th International Symposium on High-Performance Distributed Computing(HPDC-12 2003), 22-24 June 2003, Seattle, WA, USA [10], pages 4857.

    [38] Stephen D Wolthusen. Overcast: Forensic discovery in cloud environments. In IT Security Incident Management and IT Forensics,2009. IMF09. Fifth International Conference on, pages 39. IEEE, 2009.

    [39] Sonali Yadav. Comparative study on open source software for cloud computing platform: Eucalyptus, openstack and opennebula.International Journal Of Engineering And Science, 3(10):5154, 2013.

    IntroductionCloud SystemsGrid computingCloud computingCloud computing: features and challenges

    Cloud Computing InfrastructurePublic infrastructureAmazon Elastic Compute Cloud (EC2)Google App Engine (GAE)Microsoft Azure

    Private infrastructureOpenStackEucalyptusOpenNebula

    ForensicsIntroduction to Cloud ForensicsForensics ChallengesForensics directions

    Conclusion and Future WorkReferences