Bringing Introspection into BlobSeer: Towards a Self ... · by extensive search through the...

16
Bringing Introspection into BlobSeer: Towards a Self-Adaptive Distributed Data Management System Alexandra Carpen-Amarie, Alexandru Costan, Jing Cai, Gabriel Antoniu, Luc Boug´ e To cite this version: Alexandra Carpen-Amarie, Alexandru Costan, Jing Cai, Gabriel Antoniu, Luc Boug´ e. Bringing Introspection into BlobSeer: Towards a Self-Adaptive Distributed Data Management System. International Journal of Applied Mathematics & Computer Science, University of Zielona ora Press, Poland, 2011, 21 (2), pp.229-242. <http://versita.metapress.com/content/3381m823521133q7/fulltext.pdf>. <10.2478/v10006- 011-0017-y>. <inria-00555610> HAL Id: inria-00555610 https://hal.inria.fr/inria-00555610 Submitted on 14 Jan 2011 HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destin´ ee au d´ epˆ ot et ` a la diffusion de documents scientifiques de niveau recherche, publi´ es ou non, ´ emanant des ´ etablissements d’enseignement et de recherche fran¸cais ou ´ etrangers, des laboratoires publics ou priv´ es.

Transcript of Bringing Introspection into BlobSeer: Towards a Self ... · by extensive search through the...

Page 1: Bringing Introspection into BlobSeer: Towards a Self ... · by extensive search through the potential root fail-ure injectors proves rather time-consuming and dif-ficult. This process

Bringing Introspection into BlobSeer: Towards a

Self-Adaptive Distributed Data Management System

Alexandra Carpen-Amarie, Alexandru Costan, Jing Cai, Gabriel Antoniu, Luc

Bouge

To cite this version:

Alexandra Carpen-Amarie, Alexandru Costan, Jing Cai, Gabriel Antoniu, Luc Bouge.Bringing Introspection into BlobSeer: Towards a Self-Adaptive Distributed DataManagement System. International Journal of Applied Mathematics & ComputerScience, University of Zielona Gora Press, Poland, 2011, 21 (2), pp.229-242.<http://versita.metapress.com/content/3381m823521133q7/fulltext.pdf>. <10.2478/v10006-011-0017-y>. <inria-00555610>

HAL Id: inria-00555610

https://hal.inria.fr/inria-00555610

Submitted on 14 Jan 2011

HAL is a multi-disciplinary open accessarchive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come fromteaching and research institutions in France orabroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, estdestinee au depot et a la diffusion de documentsscientifiques de niveau recherche, publies ou non,emanant des etablissements d’enseignement et derecherche francais ou etrangers, des laboratoirespublics ou prives.

Page 2: Bringing Introspection into BlobSeer: Towards a Self ... · by extensive search through the potential root fail-ure injectors proves rather time-consuming and dif-ficult. This process
Page 3: Bringing Introspection into BlobSeer: Towards a Self ... · by extensive search through the potential root fail-ure injectors proves rather time-consuming and dif-ficult. This process

Int. J. Appl. Math. Comput. Sci., , Vol. , No. , –DOI:

BRINGING INTROSPECTION INTO BLOBSEER: TOWARDS ASELF-ADAPTIVE DISTRIBUTED DATA MANAGEMENT SYSTEM

ALEXANDRA CARPEN-AMARIE ∗ , ALEXANDRU COSTAN ∗∗, JING CAI ∗∗∗,GABRIEL ANTONIU ∗ , LUC BOUGÉ ∗∗∗∗

∗ INRIA Rennes - Bretagne Atlantique, France, e-mail: {alexandra.carpen-amarie,gabriel.antoniu}@inria.fr

∗∗University Politehnica of Bucharest, e-mail: [email protected]

∗∗∗City University of Hong Kong, e-mail: [email protected]

∗∗∗∗ENS Cachan/Brittany, IRISA, Rennes, France, e-mail: [email protected]

Introspection is the prerequisite of an autonomic behavior, the first step towards a performance improvementand a resource-usage optimization for large-scale distributed systems. In Grid environments, the task of ob-serving the application behavior is assigned to monitoring systems. However, most of them are designed toprovide general resource information and do not consider specific information for higher-level services. Moreprecisely, in the context of data-intensive applications, a specific introspection layer is required to collect dataabout the usage of storage resources, about data access patterns, etc. This paper discusses the requirementsfor an introspection layer in a data-management system for large-scale distributed infrastructures. We focuson the case of BlobSeer, a large-scale distributed system for storing massive data. The paper explains why andhow to enhance BlobSeer with introspective capabilities and proposes a three-layered architecture relying on theMonALISA monitoring framework. We illustrate the autonomic behavior of BlobSeer with a self-configurationcomponent aiming to provide storage elasticity by dynamically scaling the number of data providers. Thenwe propose a preliminary approach for enabling self-protection for the BlobSeer system, through a maliciousclients detection component. The introspective architecture has been evaluated on the Grid’5000 testbed, withexperiments that prove the feasibility of generating relevant information related to the state and the behavior ofthe system.

Keywords: distributed system, storage management, large-scale system, monitoring, introspection.

1. Introduction

Managing data at a large scale has become a crit-ical requirement in a wide spectrum of researchdomains, ranging from data-mining to high-energyphysics, biology or climate simulations. Grid in-frastructures provide the typical environments forsuch data-intensive applications, enabling access toa large number of resources and guaranteeing a pre-dictable Quality of Service. However, as the expo-nentially growing data is correlated with an increas-ing need for fast and reliable data access, data man-agement continues to be a key issue that highly im-pacts on the performance of applications.

More specifically, storage systems intended for

very large scales have to address a series of chal-lenges, such as a scalable architecture, data loca-tion transparency, high throughput under concur-rent accesses and the storage of massive data withfine grain access. Although these requirements arethe prerequisites for any efficient data-managementsystem, they also imply a high degree of complexityin the configuration and tuning of the system, withpossible repercussions on the system’s availabilityand reliability.

Such challenges can be overcome if the system isoutfitted with a set of self-management mechanismsthat enable autonomic behavior, which can shift theburden of understanding and managing the systemstate from the human administrator to an automatic

Page 4: Bringing Introspection into BlobSeer: Towards a Self ... · by extensive search through the potential root fail-ure injectors proves rather time-consuming and dif-ficult. This process

2 A. Carpen-Amarie et al.

decision-making engine. However, self-adaptationis impossible without a deep and specific knowledgeof the state of both the system and the infrastruc-ture where the system is running on. It heavily relieson introspection mechanisms, which play the crucialrole of exposing the system behavior accurately andin real time.

On existing geographically-distributed plat-forms (e.g. Grids), introspection is often limited tolow-level tools for monitoring the physical nodesand the communication interconnect: they typicallyprovide information such as CPU load, network traf-fic, job status, file transfer status, etc. In general,such low-level monitoring tools focus on gatheringand storing monitored data in a scalable and non-intrusive manner (Zanikolas and Sakellariou, 2005).

Even though many Grid monitoring applica-tions have been developed to address such generalneeds (Massie et al., 2004) (Gunter et al., 2000), lit-tle has been done when it comes to enabling in-trospection for large-scale distributed data manage-ment. This is particularly important in the contextof data-intensive applications distributed at a largescale. In such a context, specific parameters relatedto data storage need to be monitored and analyzedin order to enable self-optimization in terms of re-source usage and global performance. Such parame-ters regard physical data distribution, storage spaceavailability, data access patterns, application-levelthroughput, etc.

This paper discusses the requirements of alarge-scale distributed data-management service interms of self-management. It explains which self-adaptation directions can serve a data-managementservice designed for large-scale infrastructures. Fur-thermore, it focuses on introspection, identifying thespecific ways in which introspection can be used toenable an autonomic behavior of a distributed datastorage system.

As a case study, we focus on BlobSeer (Nicolaeet al., 2010), a service for sharing massive data atvery large scale in a multi-user environment. Wepropose a three-layered architecture enabling Blob-Seer with introspection capabilities. We validate ourapproach through an implementation based on thegeneric MonALISA (Legrand et al., 2004) monitor-ing framework for large-scale distributed services.Moreover, we provide two applications for the intro-spection layer, targeting self-configuration and self-protection, which take advantage of the introspec-tive features that BlobSeer is equipped with.

The remainder of the paper is organized as fol-lows. Section 2 summarizes existing efforts in theGrid monitoring systems field, emphasizing theirlimitations when it comes to enabling specific in-

trospection requirements. Section 3 explains whichself-management directions fit the needs of data-management systems. Section 4 provides a brief de-scription of BlobSeer and describes the specific in-trospection mechanism that we designed and im-plemented and the data that need to be collected insuch a data-management system. Section 5 presentsthe applications of the introspective features of Blob-Seer, namely a self-configuration module dealingwith storage elasticity and the preliminary steps to-wards a self-protection component. In Section 6 wediscuss the feasibility and efficiency of our approach,by presenting a visualization tool and a set of experi-ments realized on the Grid’5000 testbed. Finally, Sec-tion 7 draws conclusions and outlines directions forfuture developments.

2. Related work

The autonomic behavior of large scale distributedsystems aims to deal with the dynamic adapta-tion issues by embedding the management of com-plex systems inside the systems themselves, allevi-ating the users and administrators from additionaltasks. A distributed service, like a storage service,is said to be autonomic if it encapsulates some au-tonomic behavior (Gurguis and Zeid, 2005) suchas self-configuration, self-optimization, self-healing,and self-protection (Kephart and Chess, 2003).

In this context, performance evaluation be-comes a critical component of any dynamic sys-tem that requires high throughput, scheduling, loadbalancing or analysis of applications’ performancesand communications between nodes. In Grid en-vironments, previous research has often limited tousing historical information to create models onwhich various analysis and mining techniques areapplied. The results were thereafter used for per-forming more efficient job mappings on available re-sources. The autonomic behavior depends on mon-itoring the distributed system to obtain the data onwhich decisions are based. Experience with produc-tion sites showed that in large distributed systemswith thousands of managed components, the pro-cess of identifying the causes of faults in due timeby extensive search through the potential root fail-ure injectors proves rather time-consuming and dif-ficult. This process may interrupt or obstruct impor-tant system services. Several techniques were usedto address these issues.

One approach relies on Bayesian Networks(BNs) (Cowell et al., 1999), often used to modelsystems whose behaviors are not fully understood.We investigated some consistent work already doneon the probabilistic management in distributed sys-

Page 5: Bringing Introspection into BlobSeer: Towards a Self ... · by extensive search through the potential root fail-ure injectors proves rather time-consuming and dif-ficult. This process

Towards a Self-Adaptive Distributed Data Management System 3

tems. Hood et. al utilize Bayesian networks forthe proactive detection of abnormal behavior in adistributed system (Hood and Ji, 1997). Steinderet al. apply Bayesian reasoning techniques to per-form fault localization in complex communicationsystems (Steinder and Sethi, 2004). Ding et al.present the probabilistic inference in fault manage-ment based on Bayesian networks (Ding et al., 2004).However, the Bayesian Network paradigm usedwithin all these works does not provide direct mech-anisms for modeling the temporal dependencies indynamic systems (Santos and Young, 1999), which isessential for enhancing the autonomic behavior.

Another approach takes time into considerationby identifying the dynamic changes in distributedsystems as a discrete nonlinear time series. Previ-ous research work on scalable distributed monitor-ing for autonomous systems can be broadly classi-fied into two categories: relying on decentralized ar-chitectures such as hierarchical aggregation (Van Re-nesse et al., 2003) or peer-to-peer structure (Albrechtet al., 2005) to distribute monitoring workload; andtrading off information coverage (Liang et al., 2007)or information precision (Jain et al., 2007) for lowermonitoring cost. In contrast, our research focuseson identifying the relevant parameters for an auto-nomic introspection layer, while relying on the ex-tension and adaptation of some existing monitoringtools for tracking these parameters. The monitor-ing solution should further meet our needs for non-intrusiveness and minimized monitoring costs.

Exploring correlation patterns among dis-tributed monitoring data sources has been exten-sively studied in various contexts such as sensornetwork monitoring (Vuran and Akyildiz, 2006),distributed event tracking (Jain et al., 2004), andresource discovery (Cardosa and Chandra, 2008).While the general idea of exploring temporal andspatial correlations is not new, we shall emphasizeapplying the idea to distributed information track-ing over large-scale networked systems requiresnon-trivial system analysis and design. In our case,it means discovering dynamic correlation patterns(for some predefined targeted events: node failures,malicious clients intrusions, etc.) among distributedinformation sources, using light-weight methods in-stead of assuming a specific probabilistic model, asin wireless sensor networks, for instance.

The works mentioned above, although they areable to provide some means of monitoring for sin-gular or aggregate services, they do not dynamicallyreplace the faulty service once failure has been de-tected, or take automated actions to optimize theoverall system’s performance, as our work aims towithin a large scale distributed storage system.

3. Self-adaptation for large scale data-management systems

A large scale data-management platform is a com-plex system that has to deal with changing ratesof concurrent users, the management of huge dataspread across hundreds of nodes or with maliciousattempts to access or to damage stored data. There-fore, such a system can benefit from a self-adaptationcomponent that enables an autonomic behavior. Werefine the set of self-adaptation directions that bestsuit the the requirements of data-management sys-tems: they match the main self-management prop-erties defined for autonomic systems (Kephart andChess, 2003) (Parashar and Hariri, 2005).

Self-awareness is the feature that enables a sys-tem to be aware of the resource usage and thestate of its components and of the infrastructurewhere they are running. This is mainly achievedthrough monitoring and interpreting the rele-vant information generated by the usage of thesystem.

Self-optimization is the ability to efficiently allocateand use resources, while dealing with changingworkloads. It aims at optimizing the system’sperformance and increasing data availability.

Self-configuration is the property that addressesthe dynamic adaptation of the system’s deploy-ment scheme as a response to changing environ-ment conditions. The system has to be able toreconfigure on the fly, when its state requires orallows for a change in the number of managednodes.

Self-protection addresses the detection of hostile orintrusive actions directed towards the system’scomponents and enables the system to automat-ically take appropriate measures to enforce se-curity policies and make itself less vulnerable tosubsequent similar attacks.

In order to improve the performance and the ef-ficiency of the resource usage in a data-sharing sys-tem, we define a set of goals that justify the need forthe aforementioned properties:

Monitoring. The constant surveillance of the stateof a system and of the events that trigger sys-tem reactions is the prerequisite of all the otherself-adaptation directions. Thus, the self-awarenessproperty is of utmost importance for providing sup-port for an autonomous behavior.

Page 6: Bringing Introspection into BlobSeer: Towards a Self ... · by extensive search through the potential root fail-ure injectors proves rather time-consuming and dif-ficult. This process

4 A. Carpen-Amarie et al.

(a) The architecture of the BlobSeer system (b) The architecture of the introspective BlobSeer

Fig. 1. BlobSeer

Dynamic dimensioning. The performance of data-access primitives is influenced by the number of run-ning nodes of the data-sharing system. Moreover,the load of each component that stores data is alsodependent on the available storage nodes and ontheir capacity to serve user requests. On the otherhand, the workload is often unpredictable, and thedeployment of the system on a large number ofphysical nodes can lead to underused storage nodeswhen the number of clients is low or the stored datais not large enough. These reasons account for theneed to enhance a large-scale storage system with amechanism that dynamically adjusts the number ofdeployed storage nodes. This is equivalent to tak-ing advantage of the real-time indicators of the stateof the system within a self-configuration componentthat can observe a heavy load or underutilized com-ponents.

Malicious clients detection. A data-sharing sys-tem distributed on a large number of nodes can fitthe needs of applications that generate importantamounts of data only if it can provide a degree ofsecurity for the stored information. For this reason,the system has to be able to recognize malicious re-quests generated by unauthorized users and to blockillegal attempts to inject or to modify data. There-fore, a self-protection component that enforces theserequirements has to be integrated into the system.

4. Towards an introspective BlobSeer

BlobSeer is a data-sharing system which addressesthe problem of efficiently storing massive, unstruc-tured data blocks called binary large objects (referredto as BLOBs further in this paper), in large-scale, dis-tributed environments. The BLOBs are fragmentedinto small, equally-sized chunks. BlobSeer provides

an efficient fine-grained access to the chunks belong-ing to each BLOB, as well as the possibility to modifythem, in distributed, multi-user environments.

4.1. Architecture. The architecture of BlobSeer(Figure 1(a)) includes multiple, distributed enti-ties. Clients initiate all BLOB operations: CREATE,READ, WRITE and APPEND. There can be manyconcurrent clients accessing the same BLOB or dif-ferent BLOBs in the same time. The support forconcurrent operations is enhanced by storing thechunks belonging to the same BLOB on multiplestorage providers. The metadata associated with eachBLOB are hosted on other components, called meta-data providers. BlobSeer provides versioning support,so as to prevent chunks from being overwritten andto be able to handle highly-concurrent WRITE andAPPEND operations. For each of them, only a patchcomposed of the range of written chunks is addedto the system. Finally, the system comprises twomore entities: the version manager that deals withthe serialization of the concurrent WRITE/APPENDrequests and with the assignment of version num-bers for each new WRITE/APPEND operation; theprovider manager, which keeps track of all storageproviders in the system.

A typical setting of the BlobSeer system in-volves the deployment of a few hundreds storageproviders, storing BLOBs of the order of the TB.The typical size for a chunk within a blob can besmaller that 1 MB, whence the challenge of deal-ing with hundreds of thousands of chunks belong-ing to just one BLOB. BlobSeer provides efficientsupport for heavily-concurrent accesses to the storeddata, reaching a throughput of 6.7 GB/s aggregatedbandwidth for a configuration with 60 metadataproviders, 90 data providers and 360 concurrentwriters, as explained in (Nicolae et al., 2009).

Page 7: Bringing Introspection into BlobSeer: Towards a Self ... · by extensive search through the potential root fail-ure injectors proves rather time-consuming and dif-ficult. This process

Towards a Self-Adaptive Distributed Data Management System 5

4.2. Introspection mechanisms on top of BlobSeer.We enhanced BlobSeer with introspection capabil-ities, in order to enable this data-sharing platformwith an autonomic behavior. In (Carpen-Amarieet al., 2010), we present the three-layered architec-ture we designed to identify and generate relevantinformation related to the state and the behavior ofthe system (Figure 1(b)). Such information is thenexpected to serve as an input to a higher-level self-adaptation engine. These data are yielded by an (1)introspection layer, which processes the raw data col-lected by a (2) monitoring layer. The lowest layer isrepresented by the (3) instrumentation code that en-ables BlobSeer to send monitoring data to the upperlayers.

4.2.1. Introspection: what data to collect?. Theself-adaptation engine can only be effective if it re-ceives accurate data from the introspection layer. Thelatter generates data ranging from general informa-tion about the running nodes to specific data regard-ing the stored BLOBs and their structure.

General information. These data are essentiallyconcerned with the physical resources of the nodesthat act as storage providers. They include CPUusage, network traffic, disk usage, storage space ormemory. A self-adapting system has to take into ac-count information about the values of these param-eters across the nodes that make up the system, aswell as about the state of the entire system, by meansof aggregated data. For instance, the used and avail-able storage space at each single provider play a cru-cial role in deciding whether additional providersare needed or not.

Individual BLOB-related data. The most sig-nificant information for a single BLOB is its accesspattern, i.e. the way the chunks and the versionsare accessed through READ and WRITE operations.The basic data are the number of read accesses foreach chunk that the BLOB version consists of, andthe number of WRITE operations performed on theBLOB for each chunk. These data facilitate the iden-tification of the regions of the BLOB comprisingchunks with a similar number of accesses, informa-tion that can influence the adopted replication strat-egy.

Global state. Even though the provider-allocation algorithm has access to the details withineach BLOB, it is essential to have an overview ofthe whole data stored in the BlobSeer system, froma higher-level point of view. Some of the key dataat this global level are the total number of accessesassociated with each provider. This is a measure ofthe load of each of them and can directly influence

the selection of the providers that will be allocatednew chunks, depending on their deviation from theaverage load within the system.

4.2.2. Monitoring: how to collect?. The input forthe introspective layer consists of raw data that areextracted from the running nodes of BlobSeer, col-lected and then stored, a set of operations realizedwithin the monitoring layer. Therefore, it can relyon a monitoring system designed for large-scale en-vironments that implements these features. Such amonitoring framework has to be both scalable andextensible, so as to be able to deal with the hugenumber of events generated by a large-scale data-management system, as well as to accommodatesystem-specific monitoring information and to offera flexible storage schema for the collected data.

The monitoring framework – MonALISA. TheGlobal Grid Forum (GGF, 2010) proposed a GridMonitoring Architecture (GMA) (Tierney et al.,2002), which defines the components neededby a scalable and flexible Grid monitoring sys-tem: producers, consumers, and a directory ser-vice. A wide variety of Grid monitoring sys-tems (Zanikolas and Sakellariou, 2005), such as Gan-glia (Massie et al., 2004), RGMA (Cooke et al., 2004),GridICE (Andreozzia et al., 2005), comply with thisarchitecture.

Among them, we selected MonALISA (Moni-toring Agents in a Large Integrated Services Architec-ture) (Legrand et al., 2004) for our data-monitoringtasks, as it is a general-purpose, flexible frame-work, which provides the necessary tools for col-lecting and processing monitoring information inlarge-scale distributed systems. Moreover, it is aneasily-extensible system, which allows the definitionand processing of user-specific data, by means ofan API for dynamically-loadable modules. MonAL-ISA is currently used to monitor large high-energyphysics facilities; it is deployed on over 300 sites be-longing to several experiments, such as CMS or AL-ICE (ALICE, 2010).

In BlobSeer, the main challenge the monitoringlayer has to cope with, is the large number of stor-age provider nodes and therefore the huge num-ber of BLOB chunks, versions and huge BLOB sizes.Furthermore, it has to deal with hundreds of clientsthat concurrently access various parts of the storedBLOBs, as they generate a piece of monitoring in-formation for each chunk accessed on each provider.MonALISA is suitable for this task, as it is a systemdesigned for large-scale environments and it provedto be both scalable and reliable.

Page 8: Bringing Introspection into BlobSeer: Towards a Self ... · by extensive search through the potential root fail-ure injectors proves rather time-consuming and dif-ficult. This process

6 A. Carpen-Amarie et al.

Instrumenting BlobSeer. The data generated bythe instrumentation layer are relayed by the mon-itoring system and finally fed to the introspectionlayer. The instrumentation layer is implemented as acomponent of the monitoring layer. The MonALISAframework provides a library called ApMon that canbe used to send the monitoring data to the MonAL-ISA services. At the providers, the instrumentationcode consists in listeners located on each of them,which report to the monitoring system each time achunk is written or read. The monitoring informa-tion from the version manager is collected using aparser that monitors the events recorded in the logs.The state of the physical resources on each node ismonitored through an ApMon thread that periodi-cally sends data to the monitoring service.

5. Introducing self-adaptation for Blob-Seer

To introduce an autonomic behavior in BlobSeer,we investigated two directions. The first approachaims at enhancing BlobSeer with self-configurationcapabilities, as a means to support storage elastic-ity trough dynamic deployment of data providers.The second direction addresses the self-protection ofBlobSeer from malicious clients by detecting and re-acting to potential threats in real-time based on theinformation yielded by the introspection layer. Inthis section, we detail these two approaches.

5.1. Self-configuration through dynamic dataproviders deployment. Dynamic dimensioning isa means to achieve the self-configuration of Blob-Seer, by enabling the data providers to scale up anddown depending on the detected system’s needs.The component we designed adapts the storage sys-tem to the environment by contracting and expand-ing the pool of storage providers based on the sys-tem’s load.

The key idea of the Dynamic Data Providers De-ployment component is the automatic decision thathas to be made on how many resources the sys-tem needs to operate normally while keeping the re-sources utilization down to a minimum. This prob-lem is addressed by using a test-decided heuristicbased on the monitoring data. The system maintainstwo pools of providers:

Active Pool of Providers (APP) - pool of providersthat are currently on and are actively used bythe BlobSeer infrastructure.

Backup Pool of Providers (BPP) - pool of providersthat are currently off, waiting in stand-by to beactivated in order to be used.

Fig. 2. The Dynamic Deployment module’s architecturaloverview

The goal is to dynamically switch providers fromone pool to another when certain conditions are met,in order to optimize resource usage; instead of re-serving a large number of nodes which eventuallyare not effectively used, the system only relies on theAPP and self-adapts its execution using the BPP.

5.1.1. Architectural Overview. The dynamic de-ployment decision is based on retrieving the mon-itoring data and computing a score that evaluatesthe status of each provider. The monitoring data isretrieved from two different sources, each one withspecific metrics: BlobSeer-related data and physicalresources information. These data are stored andprocessed using a Monitoring Repository. Based onthe real-time monitoring information the decision al-gorithm computes a heuristic score. Its value deter-mines the decision of removing or adding a node tothe active pool of providers.

A first step involves taking the deployment de-cision based on retrieving the monitoring data andcomputing a score that evaluates the status of eachprovider. The monitoring data is retrieved fromtwo different sources, each one with specific metrics:BlobSeer related data and physical resources infor-mation. These data are stored and processed using aMonitoring Repository. Based on the real-time moni-toring information the decision algorithm computesa heuristic score. Its value determines the decisionof removing or adding a node to the active pool ofproviders.

In order to take the corresponding action basedon the result obtained, the application needs to geta list of available nodes (Data Providers) from theProvider Manager which can be turned on or off, de-pending on the decision taken. This part is also re-sponsible for notifying the BlobSeer system, specifi-cally the Provider Manager, of the changes made in

Page 9: Bringing Introspection into BlobSeer: Towards a Self ... · by extensive search through the potential root fail-ure injectors proves rather time-consuming and dif-ficult. This process

Towards a Self-Adaptive Distributed Data Management System 7

Fig. 3. The sequence diagram of the dynamic providersdeployment

the system.The main actors of the Dynamic Deployment

service are the decision taking component (theProviderPoolManager) and the decision enforce-ment component (the ProviderMover), as depictedin Figure 2. The ProviderPoolManager analyzesthe monitoring information and using some config-urable policies takes the decision of either enablingor disabling a set of Data Providers. The Provi-derMover is responsible with putting this decisioninto practice by moving a provider from the ActivePool of Providers to the Backup Pool of Providersor vice-versa, depending on what commands it re-ceives from the ProviderPoolManager.

The interaction with the BlobSeer’s ProviderManager is represented by requests for the list ofthe active Data Providers running in the system at aspecific moment in time. The ProviderPoolManagerreads the coordinates of the Provider Manager andcontacts it to obtain a list of tuples (host, port)that point to the nodes where Data Providers areactive. The ProviderMover also manages the twopools, APP and BPP, and the Providers’ migra-tion between them. The ProviderMover notifies theProvider Manager of a change in the APP. If the noti-fication fails, the ProviderMover doesn’t retry it, re-lying on the watchdog facility implemented in Blob-Seer, which scans the entire list of Providers to trackthe active providers. Finally, the ProviderMovercommunicates directly with the Data Providers andissues the start or shutdown commands throughwhich a Provider is moved from BPP to APP or fromAPP to BPP, respectively.

The sequence diagram depicted in Figure 3 il-lustrates the flow of actions within the DynamicDeployment module. The monitoring data is re-trieved continuously, as a separate process by themonitoring module, and is stored into a monitoringrepository. The ProvidePoolManager connects to the

Provider Manager to get the list of active providers.Once this data is obtained, the Pool Manager startscomputing a score for each provider. Based on a con-figuration file specifying the targeted scenarios andthe heuristic used, a decision is taken and communi-cated to the ProviderMover. This, in turn, calls thescripts that start or stop a particular provider.

5.1.2. Heuristic Providers Evaluation. The scor-ing algorithm provides a method to detect whichproviders should be moved from APP to BPP. Thefactors to be taken into consideration and trackedusing the introspection layer can be divided intotwo subcategories: physical factors (depending on thephysical node that runs the Provider, e.g., the freedisk space, the average bandwidth usage, the CPUload or the system uptime) and BlobSeer factors (met-rics referring to the BlobSeer behavior, e.g., the num-ber of read/write accesses per time unit, the size ofstored chunks and BLOBs, the replication degree).

We illustrate this approach with a common sce-nario identified by the Dynamic Providers Deploy-ment module and treated accordingly by stoppingthe unnecessary providers. In this case, if the intro-spection layer detects that on one provider the freedisk space is above the 70% threshold, the replica-tion factor for the stored chunks is greater than 1,with a small read and write access rate (e.g. lessthan one access per hour), it decides to shut downthe provider. All the values referred to above areadjustable through a configuration file. The currentvalues were chosen based on a set of performanceevaluation experiments aiming to identify the trade-off between the costs of shutting down one providerand moving its data to another one, and the bene-fits of using less resources. The scenario illustratesthe case of a provider with extra disk space avail-able, that is not used by clients. Considering all thestored data is also replicated on other providers, itis reasonable to shut down this provider in order toefficiently use the available resources. The stoppingdecision is only taken when the shutting down costsare smaller and there are available nodes where totransfer the data in order to preserve the replicationfactors.

The self-configuration engine is not limited todetecting this type of scenarios, several other pat-terns are identifiable using a simple specificationmechanism. The conditions making up the scenariosare modeled as factors, used to compute a score foreach provider. The heuristic used in the score com-putation is based on weight factors using the follow-

Page 10: Bringing Introspection into BlobSeer: Towards a Self ... · by extensive search through the potential root fail-ure injectors proves rather time-consuming and dif-ficult. This process

8 A. Carpen-Amarie et al.

Algorithm 1 Scaling down data providers.1: procedure SCALING_DOWN(DataProvidersList)2: for all DataProvider in DataProvidersList do3: RetriveMonitoringData(DataProvider)4: S← ComputeScore(DataProvider)5: if S < scoreThreshold then6: Keep DataProvider in APP7: else8: if DataReplicationDegree >

replicationThreshold then9: Move DataProvider to BPP

10: AvailableProviders ← retrieveavailable providers from the Provider Manager

11: TransferDataTo(AvailableProviders)12: Update the metadata13: ShutDown(DataProvider)14: else15: Keep DataProvider in APP16: end if17: end if18: end for19: end procedure

ing formula:

S =n

∑i=1

w f ti ∗wc fi (1)

where w f ti represents the weight of the factor i fromthe total score and wc fi represents the weight of thetrue condition from the factor i. With these notationsthe pseudo-code for scaling down data providers ispresented in Algorithm 1.

5.2. Self-protection through malicious clients de-tection. Detecting malicious clients is the first steptowards enabling self-protection for the BlobSeersystem. Such a feature has to take into account sev-eral types of security threats and to react when suchattacks occur.

In this section, we propose a simple maliciousclients detection mechanism that focuses on protocolbreaches within BlobSeer, as this is a critical vulnera-bility of a data-management system that enables theclients to directly access the data storage nodes in or-der to provide very efficient data transfers. The goalof the detection component is to identify the knownforms of protocol misuse, and thus to help the sys-tem to maintain the stored data in a consistent state.

5.2.1. Protocol breach scenarios for BlobSeer. Amalicious user can try to compromise the system bydeliberately breaking the data-insertion protocols.

Algorithm 2 Data-writing step.1: procedure WRITE_DATA(buffer, offset, size)2: wid← generate unique write id3: noCh← ⌈size/chSize⌉4: P← get noCh providers from provider man-

ager5: D← ∅

6: for all 0 ≤ i <noCh in parallel do7: cid← generate unique chunk id8: chOffset← chSize∗i9: store buffer[chOffset .. chOffset + chSize] as

chunk (cid,wid) on provider P[i]10: D← D ∪ {(cid, wid, i, chSize)}11: Pglobal ← Pglobal ∪ {(cid, P[i])}12: end for13: end procedure

This kind of behavior is a starting point for DoS at-tacks, in which the user attempts to overload the sys-tem through large numbers of malformed or incom-plete requests. To cope with this security risk, spe-cific mechanisms have to be developed to quicklydetect the illegal accesses and isolate the user thatinitiated them.

The most vulnerable data access operation iswriting data into Blobseer, as it gives a malicioususer not only the opportunity to overload the systemand to increase its response time, but also the meansto make available corrupted data.

The WRITE operation imposes a strict protocolto the user that wants to correctly insert data intothe system. We consider the typical case of WRITEoperations in BlobSeer, that is when a user attemptsto write a continuous range of chunks to a specificBLOB. For simplicity we can assume that the WRITEoperation consists of two independent phases thathave to be executed consecutively. These two stepscan be summarized as follows (the full description ofthe data access primitives in BlobSeer can be foundin (Nicolae et al., 2010)):

The data-writing step. A simplified descriptionof this operation is provided in Algorithm 2. We as-sume the size of data to be written is a multiple of apredefined chunk size, denoted chSize, as this is of-ten the case in BlobSeer. The input parameters of thisstep are the data to be written as a string buffer, theoffset within the BLOB where the data has to be in-serted and the size of the sequence.

The client connects to the provider managerand requests a list of data providers, P, which canhost the chunks to be written. Then, the chunksare sent in parallel to the data providers, togetherwith a unique identifier, cid, and the identifier of the

Page 11: Bringing Introspection into BlobSeer: Towards a Self ... · by extensive search through the potential root fail-ure injectors proves rather time-consuming and dif-ficult. This process

Towards a Self-Adaptive Distributed Data Management System 9

Algorithm 3 Data-publication step1: procedure PUBLISH_DATA(offset, size, D, wid)2: writeInfo ← invoke remotely on version

manager ASSIGN_VERSION(offset, size, wid)3: BUILD_METADATA(writeInfo, D)4: invoke remotely on version manager

COMPLETE_WRITE (writeInfo)5: end procedure

WRITE operation, wid. Upon the successful com-pletion of this step, the information associated withall the written chunks will be stored in a chunk de-scriptor map denoted D. Additionally, the providersthat hold each cid are stored in Pglobal, a containerwhere the addresses of all the chunks in the systemare saved.

The data-publication step is represented by thecreation of the metadata associated with the writtendata and the publication of the written chunk rangeas a new version, as described in Algorithm 3.

First, the client asks the version manager for anew version for its chunk list, and then it proceeds tothe creation of metadata, starting from the chunk de-scriptor map D generated in the first step. The writeis finalized after the client successfully invokes theCOMPLETE_WRITE procedure on the version man-ager, which in turn is responsible for publishing thenew version of the BLOB.

A correct WRITE operation is defined as the suc-cessful completion of the aforementioned steps, withthe constraint that the published information con-cerning the written chunk range is consistent withthe actual data sent to the data providers, that is,the values of D and wid that are sent to the versionmanager correspond to chunks that have been writ-ten on data providers. As a consequence, there aretwo types of protocol breaches that can be detectedfor the WRITE operation:

Data written and not published. In this case, a ma-licious user obtains a list of providers from theprovider manager and then starts writing datato the providers. The second step is neverissued and thus the version manager, whichkeeps track of all the BLOBs and their versions,will never be aware of the data inserted into thesystem. This kind of protocol breach can be de-veloped into a Denial of Service (DoS) attack,targeted to the overloading of one or more dataproviders.

Publication of inconsistent data. The attack thatcorresponds to this situation aims to disruptthe computations that use data stored by the

BLOBs. As an example, a user may attemptto compromise the system by making availabledata that does not actually exist. Therefore, anapplication can start reading and processing thedata and without being aware that the metadatacontain fake references. Hence the computa-tion would be compromised and the applicationforced to restart the processing.

5.2.2. The detection mechanism. Enablingself-protection in BlobSeer relies on coupling amalicious-clients detection module with the intro-spection layer. On one hand, such a module has toidentify the malicious activities that attempt to com-promise the system and to isolate users that initial-ize them. On the other hand, it should not interferewith BlobSeer operations, so as to preserve the effi-cient data-accesses for which BlobSeer is optimized.The introspection layer processes informations mon-itored independently of the interactions between theuser and the system, and thus it is an ideal candidateto provide input data for a malicious clients detec-tion module.

We implemented a detection module that ad-dresses the protocol-breach attacks and generatesblacklists with the users that attempt them. Its inputdata are provided as a history of the users’ actionsby the introspection layer, which constantly moni-tors the real-time data-accesses and updates the his-tory. The user history stores the following types ofmonitoring parameters:

Data generated by the data providers. The mon-itoring information collected from the dataproviders consists in tuples that aggregate theinformation about the stored data chunks. Thedata corresponding to a new chunk writtenin the system is defined as a tuple denoted(cid, wid, noCh, chSize, ts), where wid is thewrite identifier generated in the data-writingstep and ts is the timestamp attached by themonitoring system when the data is recorded.Note that for the each wid there can be severalrecords in the user history (with differenttimestamps), as not all the chunk writes arerecorded by the monitoring system in the sametime.

Data obtained from the version manager. Theintrospection system records each new versionpublished by the version manager in the formof tuples defined as (cid, wid, v, offset, size, ts),where wid is the same write identifier used forthe data-writing step, v is the new publishedversion, offset and size identify the chunk range

Page 12: Bringing Introspection into BlobSeer: Towards a Self ... · by extensive search through the potential root fail-ure injectors proves rather time-consuming and dif-ficult. This process

10 A. Carpen-Amarie et al.

Algorithm 4 Malicious clients detection1: BL← ∅

2: lastTsChecked = 03: procedure DETECT_ILLEGAL_PUBLISH4: maxTs = getCurentTime()− windowSize5: PW ← get list of published writes such that

ts > lastTsChecked and ts ≤ maxTs6: DW ← get list of data writes such that ts >

lastTsChecked−windowSize7: lastTsChecked← max(ts) from PW8: for p ∈ PW, p = (cid, wid, o f f set, size, v) do9: if ∄d ∈ DW, d =

(cidd, widd, noChd, chSized, tsd) such thatcidd = cid, widd = wid then

10: BL← UPDATE_SCORE(BL, cid, p)11: else12: if size 6= ∑

d∈DWnoChd ∗ chSized then

13: BL← UPDATE_SCORE(BL, cid, p)14: end if15: end if16: end for17: end procedure

written into the system and ts is the timestampassigned by the monitoring system.

The detection module comprises two compo-nents, each of them dealing with a specific type ofprotocol breach. The detection mechanism for incon-sistent data publication is presented in Algorithm 4.The DETECT_ILLEGAL_PUBLISH procedure is ex-ecuted periodically and each time it inspects themost recent monitoring data recorded by the intro-spection module. The procedure searches for pub-lished versions that have no corresponding writtendata chunks or the written range of chunks does notmatch the published information. Each publishedwrite is matched against the set of chunk writes thatoccurred in a predefined time window, denoted win-dowSize, surrounding its timestamp. If no chunkswrites are found with the same client identifier andwrite id, or if the total size of the written chunks doesnot match the published size, the client is added to aglobal blacklist BL. Once blacklisted, a client is alsoassociated with a score, which can be computed ac-cording to the type of illegal action. As an example,if no chunks are written, the UPDATE_SCORE pro-cedure computes a score proportional to the writesize declared by the publication step.

The goal of the detection mechanism is to keeptrack of the malicious users and to feed this informa-tion back into the BlobSeer system, so as to enable itto react when receiving new requests from the usersidentified as malicious. The malicious users can be

made available to the provider manager as a black-list where each user’s score shows the amount offake data that the user introduced into the BlobSeersystem. The provider manager implements the allo-cation strategy that assigns providers for each userWRITE operation. Being aware of the blacklist, theprovider manager can decide to block the malicioususers by not granting the providers when they wantto write again into the system. The behavior of theprovider manager can be further refined by takinginto account the score associated with each client. Inthis case, there are several other constraints that canbe enforced on the users, such as a decreased band-width for their WRITE operations, a waiting timeimposed before being assigned the necessary list ofproviders or a size limit for the data written.

6. Experimental evaluation

We evaluated the feasibility of gathering and in-terpreting the BlobSeer-specific data needed as in-put data for the different self-optimizing directions.Our approach was to create an introspection layeron top of the monitoring system, able to processthe raw data collected from BlobSeer and to ex-tract significant information regarding the state andthe behavior of the system. We performed a seriesof experiments that evaluate the introspection layerand also provide some preliminary results concern-ing the introduction of self-protection capabilities inBlobSeer. The experiments were conducted on theGrid’5000 (Jégou et al., 2006) testbed, a large-scaleexperimental Grid platform, that covers 9 sites geo-graphically distributed across France.

6.1. Visualization tool for BlobSeer-specific data.We implemented a visualization tool that can pro-vide a graphical representation of the most impor-tant parameters yielded by the introspection layer.

We show the outcome of the introspection layerthrough an evaluation performed on 127 nodes be-longing to a Grid’5000 cluster in Rennes. The nodesare equipped with x86_64 CPUs and at least 4 GBof RAM. They are interconnected through a GigabitEthernet network. We deployed each BlobSeer en-tity on a dedicated node, as follows: two nodes wereused for the version manager and the provider man-ager, 10 nodes for the metadata providers, 100 nodesfor the storage providers and 10 nodes acted as Blob-Seer clients, writing data to the BlobSeer system.Four nodes hosted MonALISA monitoring services,which transferred the data generated by the instru-mentation layer built on top of the BlobSeer nodes toa MonALISA repository. The repository is the loca-tion where the data were stored and made available

Page 13: Bringing Introspection into BlobSeer: Towards a Self ... · by extensive search through the potential root fail-ure injectors proves rather time-consuming and dif-ficult. This process

Towards a Self-Adaptive Distributed Data Management System 11

(a) Number of WRITE accesses on each chunk of a BLOB,(each chunk is identified by its position within the BLOB).

(b) The size of all the stored versions of a BLOB.

Fig. 4. Visualization for BlobSeer-specific data

to the introspection layer.In this experiment, we used 10 BLOBs, each of

them having the chunk size of 1 MB and a total sizelarger than 20 GB. We created the BLOBs and wewrote 10 data blocks of 2 GB on each BLOB. Eachdata block overlaps the previous one by 10%. Next,we started 10 clients in parallel and each of themperformed a number of WRITE operations on a ran-domly selected BLOB. The blocks were written onthe BLOB at random offsets and they consisted of arandom number of chunks, ranging between 512 MBand 2 GB in size.

We processed the raw data collected by themonitoring layer and extracted the higher-level datawithin the introspection layer. Some results are pre-sented below, along with their graphical representa-tions.

Access patterns. They represent a significantinformation that the introspection layer has to beaware of. It can be obtained by computing the num-ber of READ/WRITE accesses. The access patternscan be examined from two points of view. The firstone regards the access patterns for each BLOB. Itconsiders the number of READ or WRITE accessesfor each chunk, for a specified version or for thewhole BLOB and it identifies the regions of theBLOB composed of chunks with the same numberof accesses (Figure 4(a)). The other one refers to thenumber of READ or WRITE operations performedon each provider, allowing for a classification of theproviders according to the pressure of the concurrentaccesses they have to withstand.

The size of all the stored versions of a BLOB.The differences between the versions of the sameBLOB are presented in Figure 4(b), where the sizeof the new data introduced by each version into thesystem is shown in MB. This information, correlated

with the number of accesses for each version, can beused to identify versions that correspond to a smallamount of data and are seldom accessed. Such ob-servations are necessary for a self-optimization com-ponent that handles the replication degree of eachversion.

6.2. Impact of the introspection architecture onthe Blobseer data-access performance. This exper-iment is designed to evaluate the impact of usingthe BlobSeer system in conjunction with the intro-spection architecture. The introspective layer col-lects data from BlobSeer without disrupting the in-teractions between its components, and thus no con-straint is enforced on the user’s accesses to the Blob-Seer entities. In this way the throughput of the Blob-Seer system is not influenced by the detection mod-ule. The only downside of such a system is the intru-siveness of the instrumentation layer that runs at thelevel of the BlobSeer components and is susceptibleof decreasing their performance.

For this experiment we used the Grid’5000 clus-ters located in Rennes and Orsay. The nodes areequipped with x86_64 CPUs and at least 2 GB ofRAM. We used a typical configuration for the Blob-Seer system, which enables the system to store mas-sive amounts of data that can reach the order ofTB. It consists of 150 data providers, 20 metadataproviders, one provider manager and one versionmanager. Both data and metadata providers storedata on their hard disks and they are configuredto store up to 64 GB and 8 GB, respectively. TheMonALISA monitoring services are deployed on 20nodes and they collect monitoring data from all theproviders, each of them being dynamically assignedto a monitoring service in the deployment phase.The repository that gathers all the monitored param-

Page 14: Bringing Introspection into BlobSeer: Towards a Self ... · by extensive search through the potential root fail-ure injectors proves rather time-consuming and dif-ficult. This process

12 A. Carpen-Amarie et al.

0

1000

2000

3000

4000

5000

0 10 20 30 40 50 60 70 80

Agg

rega

ted

thro

ughp

ut (M

B/s

)

Number of clients

BSmonBS

(a) The aggregated throughput of the WRITE operation forBlobSeer (BS) and for BlobSeer with the monitoring supportenabled (BSMON)

0

10

20

30

40

50

0 10 20 30 40 50 60 70 80

Dur

atio

n (s

)

Number of clients

PNW DetectionWrite

(b) The WRITE duration and the detection delay when con-current clients that publish data without writing it (PNW) ac-cess the BlobSeer system

Fig. 5. Performance evaluations

eters is located outside Grid’5000, as well as the de-tection module that interacts only with the reposi-tory’s database. Each entity is deployed on a dedi-cated physical machine.

This test consists of deploying a number of con-current clients that make a single WRITE operation.Each client writes 1 GB of data in a separate BLOB,using a chunk size of 8 MB. We analyze the ag-gregated throughput of the BlobSeer WRITE oper-ation obtained when deploying it standalone com-pared with the BlobSeer outfitted with the introspec-tion layers. The throughput is measured for a num-ber of clients ranging from 5 to 80 and the exper-iment was repeated 3 times for each value of thenumber of clients deployed. Figure 5(a) shows thatthe performance of the BlobSeer system is not influ-enced by the addition of the instrumentation codeand the generation of the monitoring parameters, asin both cases the system is able to sustain the samethroughput. Since the introspective layer computesits output based on the monitored data generated foreach written chunk, the more fine-grained BLOBswe use, the more monitoring information has to beprocessed. For this test, each BLOB consists of 128chunks and therefore the introspective componentperforms well even when the number of generatedmonitoring parameters reaches 10, 000, as it is thecase when testing it with more than 80 clients.

6.3. Malicious clients detection. We aim to ex-plore the first step towards a self-protecting BlobSeersystem, by building a component that can detect il-legal actions and prevent malicious users from dam-aging the stored data. To reach this goal, the detec-tion mechanism for the malicious users has to de-liver an accurate image of the users’ interaction with

BlobSeer. Moreover, it has to expose the illegal op-erations as fast as possible, so as to limit the size ofdata illegally injected into the system and to preventthe malicious users from carrying on the harmful ac-cesses. We define the detection delay as the durationof the detection phase after the end of the client’s op-erations. We use the detection delay as a measure ofthe performance of the detection module.

The aim of this experiment is to analyze theperformance of the detection module when the sys-tem is accessed by multiple concurrent maliciousclients that publish data without actually writingthem. This access pattern corresponds to a scenariowhere a number of clients access a reputation-baseddata-storage service. Each client can increase his rep-utation by sharing a large amount of data with theother users of the system. To achieve this goal, a ma-licious client may pretend to share huge data, whileit only skips the data writing phase of the WRITEoperation and publishes inexistent data.

The deployment settings are identical to the pre-vious experiment. We want to assess the behavior ofthe system under illegal concurrent accesses. Thuswe deploy only malicious clients, repeating the testwith an increasing number of clients, ranging from 5to 80. We measure both the duration of the WRITEoperation of the client and the delay between thebeginning of the WRITE and the detection of theclient that initiated it as being malicious. All theclients start writing at the same time, thus havingthe same start time. For each point in the chart, wecompute the average duration between all the clientsdeployed for that run. The results obtained in Fig-ure 5(b) show that the delay between the end of thewrite operation and the detection of the maliciousclients remains constant as the number of clients in-

Page 15: Bringing Introspection into BlobSeer: Towards a Self ... · by extensive search through the potential root fail-ure injectors proves rather time-consuming and dif-ficult. This process

Towards a Self-Adaptive Distributed Data Management System 13

creases. This is a measure of the scalability of our ap-proach, showing that the detection process is able tocope with a large number of concurrent clients andto deliver results fast enough to allow the system toblock the attackers, while sustaining the same levelof performance.

7. Conclusions and future work

This paper addresses the challenges raised by the in-troduction of introspection into a data-managementsystem for large-scale, distributed infrastructures.Such a feature aims at exposing general and service-specific data to a higher-level layer, in order to en-able the system to evolve towards an autonomicbehavior. We propose a layered architecture builton top of the BlobSeer data-management system, aservice dedicated to large-scale sharing of massivedata. The goal of this architecture is to generate aset of specific data that can serve as input for a self-adaptive engine.

We also proposed a dynamic dimensioningmodule and a malicious clients detection componentthat rely on data yielded by the introspection layer.By reacting in real-time to changes in the state ofthe system, they represent the first step towards en-hancing this system with self-configuration and self-protection capabilities.

To build the monitoring layer, we relied on theMonALISA general-purpose, large-scale monitoringframework, for its versatility and extensibility. Ourexperiments showed that it was able to scale withthe number of BlobSeer providers and to cope withthe huge amount of monitoring data generated by alarge number of clients. Moreover, it allowed us todefine and to collect BlobSeer-specific data, as wellas to visualize graphical representations associatedwith the various high-level data extracted.

The next step will consist in equipping Blob-Seer with other self-adaptive components in orderto optimize the system’s performance and resourceusage. As an example, by allowing the providermanager to rely on introspection data, this enginewill help improving the storage resource allocationstrategies. Besides, it can also provide informationbased on which adaptive data replication strategiescan be implemented. Together, such features willenable an autonomic behavior of the BlobSeer data-management platform.

Acknowledgment

Experiments presented in this paper were carriedout using the Grid’5000 experimental testbed, be-ing developed under the INRIA ALADDIN devel-

opment action with support from CNRS, RENATERand several Universities as well as other fundingbodies (see http://www.grid5000.org/).

ReferencesAlbrecht, J., Oppenheimer, D., Vahdat, A. and Patterson,

D. A. (2005). Design and implementation tradeoffsfor wide-area resource discovery, In Proceedings of14th IEEE Symposium on High Performance, ResearchTriangle Park, IEEE Computer Society, pp. 113–124.

ALICE (2010). The MonALISA Repository for ALICE,http://pcalimonitor.cern.ch/map.jsp.

Andreozzia, S., De Bortoli, N., Fantinel, S. et al. (2005).GridICE: a monitoring service for grid systems, Fu-ture Generation Computer Systems 21(4): 559–571.

Cardosa, M. and Chandra, A. (2008). Resource bun-dles: Using aggregation for statistical wide-area re-source discovery and allocation, 28th IEEE Inter-national Conference on Distributed Computing Systems(ICDCS 2008), Beijing, China, pp. 760–768.

Carpen-Amarie, A., Cai, J., Costan, A., Antoniu, G. andBougé, L. (2010). Bringing introspection into theBlobSeer data-management system using the Mon-ALISA distributed monitoring framework, First In-ternational Workshop on Autonomic Distributed Systems(ADiS 2010), Krakow, Poland, pp. 508–513. Held inconjunction with CISIS 2010 Conference.

Cooke, A., Gray, A., Nutt, W. et al. (2004). The relationalgrid monitoring architecture: Mediating informationabout the grid, Journal of Grid Computing 2(4): 323–339.

Cowell, R. G., Dawid, A. P., Lauritzen, S. L. and Spiegel-halter, D. J. (1999). Probabilistic Networks and ExpertSystems, Springer-Verlag, New York.

Ding, J., Krämer, B. J., Bai, Y. and Chen, H. (2004). Prob-abilistic inference for network management, Univer-sal Multiservice Networks: Third European Conference,ECUMN 2004, pp. 498–507.

GGF (2010). The Global Grid Forum, http://www.ggf.org/.

Gunter, D., Tierney, B., Crowley, B., Holding, M. and Lee,J. (2000). Netlogger: A toolkit for distributed systemperformance analysis, MASCOTS ’00: Proceedings ofthe 8th International Symposium on Modeling, Analysisand Simulation of Computer and Telecommunication Sys-tems, IEEE Computer Society, Washington, DC, USA,p. 267.

Gurguis, S. and Zeid, A. (2005). Towards autonomic webservices: Achieving self-healing using web services,DEASÕ05: Proceedings of Design and Evolution of Auto-nomic Application Software Conference, Missouri, USA.

Hood, C. and Ji, C. (1997). Automated proactive anomalydetection, Proceedings of IEEE International Conferenceof Network Management (IM97), San Diego, California,pp. 688–699.

Jain, A., Chang, E. Y. and Wang, Y.-F. (2004). Adaptivestream resource management using Kalman filters,SIGMOD ’04: Proceedings of the 2004 ACM SIGMODInternational Conference on Management of data, ACM,New York, NY, USA, pp. 11–22.

Page 16: Bringing Introspection into BlobSeer: Towards a Self ... · by extensive search through the potential root fail-ure injectors proves rather time-consuming and dif-ficult. This process

14 A. Carpen-Amarie et al.

Jain, N., Kit, D., Mahajan, P., Yalagandula, P., Dahlin, M.and Zhang, Y. (2007). STAR: self-tuning aggregationfor scalable monitoring, VLDB ’07: Proceedings of the33rd international conference on Very Large Data Bases,VLDB Endowment, pp. 962–973.

Jégou, Y. et al. (2006). Grid’5000: a large scale and highlyreconfigurable experimental grid testbed., Interna-tional Journal of High Performance Computing Applica-tions 20(4): 481–494.

Kephart, J. O. and Chess, D. M. (2003). The vision of auto-nomic computing, Computer 36(1): 41–50.

Legrand, I., Newman, H., Voicu, R. et al. (2004). Mon-ALISA: An agent based, dynamic service system tomonitor, control and optimize grid based applica-tions, Computing for High Energy Physics, Interlaken,Switzerland.

Liang, J., Gu, X. and Nahrstedt, K. (2007). Self-configuringinformation management for large-scale service over-lays, INFOCOM 2007. 26th IEEE International Con-ference on Computer Communications, Joint Conferenceof the IEEE Computer and Communications Societies,pp. 472–480.

Massie, M., Chun, B. and Culler, D. (2004). The Gangliadistributed monitoring system: design, implementa-tion, and experience, Parallel Computing 30(7): 817–840.

Nicolae, B., Antoniu, G. and Bougé, L. (2009). Enablinghigh data throughput in desktop grids through de-centralized data and metadata management: TheBlobSeer approach, Proceedings of the 15th Inter-national Euro-Par Conference, Delft, Netherlands,pp. 404–416.

Nicolae, B., Antoniu, G., Bougé, L., Moise, D. and Carpen-Amarie, A. (2010). BlobSeer: Next generation datamanagement for large scale infrastructures, Journal ofParallel and Distributed Computing .

Parashar, M. and Hariri, S. (2005). Autonomic computing:An overview, Unconventional Programming Paradigms,Springer Verlag, pp. 247–259.

Santos, Jr., E. and Young, J. D. (1999). Probabilistic tem-poral networks: A unified framework for reasoningwith time and uncertainty, International Journal of Ap-proximate Reasoning 20(3): 263–291.

Steinder, M. and Sethi, A. S. (2004). Probabilistic faultlocalization in communication systems using be-lief networks, IEEE/ACM Transactions on Networking12(5): 809–822.

Tierney, B., Aydt, R. and Gunter, D. (2002). A grid moni-toring architecture, Grid Working Draft GWD-PERF-16-3. http://www.gridforum.org/.

Van Renesse, R., Birman, K. P. and Vogels, W. (2003).Astrolabe: A robust and scalable technology fordistributed system monitoring, management, anddata mining, ACM Transactions on Computer Systems21(2): 164–206.

Vuran, M. C. and Akyildiz, I. F. (2006). Spatial correlation-based collaborative medium access control in wire-less sensor networks, IEEE/ACM Transactions on Net-working 14(2): 316–329.

Zanikolas, S. and Sakellariou, R. (2005). A taxonomy ofgrid monitoring systems, Future Generation Comput-ing Systems 21(1): 163–188.

Alexandra Carpen-Amarie received herEngineering Degree in 2008 from the Com-puter Science Department of PolitehnicaUniversity Bucharest, Romania. She iscurrently a Ph.D. student at ENS Cachan,France, working in the KerData Team atINRIA Rennes - Bretagne Atlantique. Herresearch interests include: large-scale dis-tributed data storage, cloud computing,monitoring in distributed systems.

Alexandru Costan is a Ph.D. student andTeaching Assistant at the Computer Sciencedepartment of the Politehnica University ofBucharest. His research interests include:Grid Computing, Data Storage and Model-ing, P2P systems. He is actively involvedin several research projects related to thesedomains, both national and international,from which it worth mentioning MonAL-ISA, MedioGRID, EGEE, P2P-NEXT, Blob-

Seer. His Ph.D. thesis is oriented on Data Storage, Representationand Interpretation in Grid Environments. He has received a Ph.D.Excellency Grant from Oracle and was awarded an IBM Ph.D. Fel-lowship in 2009.

Jing Cai is a MPhil student in the Depart-ment of Computer Science, City Universityof Hong Kong. He has worked with the ker-Data team at INRIA-Rennes research centeras a research intern in 2009. His research in-terests include distributed computing, mon-itoring in grid computing environment.

Gabriel Antoniu is a Research Scientist atINRIA Rennes - Bretagne Atlantique (CR1)and is a member of the KerData researchteam. His research interests include: gridand cloud distributed storage, large-scaledistributed data management and sharing,data consistency models and protocols, gridand peer-to-peer systems. Gabriel Antoniureceived his Bachelor of Engineering degreefrom INSA Lyon in 1997; his Master degree

in Computer Science from ENS Lyon in 1998; his Ph.D. degreein Computer Science in 2001 from ENS Lyon; his Habilitation forResearch Supervision (HDR) from ENS Cachan in 2009.

Luc Bougé , Professor, is the Chair of theInformatics and Telecommunication Depart-ment (DIT) at ENS Cachan - Antenne de Bre-tagne. He is also the leader of the KerDataJoint Team of INRIA Rennes - Bretagne At-lantique and ENS Cachan - Antenne de Bre-tagne. His research interests include the de-sign and semantics of parallel programminglanguages and the management of data invery large distributed systems such as grids,

clouds and peer-to-peer (P2P) networks.