Data Management in Cloud Workflow Systems Dong Yuan

16
Data Management in Cloud Workflow Systems Dong Yuan Faculty of Information and Communication Technology Swinburne University of

description

Data Management in Cloud Workflow Systems Dong Yuan Faculty of Information and Communication Technology Swinburne University of Technology. Outline. Cloud Computing & Cloud Workflow Systems Introduction to cloud workflow systems. A brief overview of grid workflow systems. - PowerPoint PPT Presentation

Transcript of Data Management in Cloud Workflow Systems Dong Yuan

Page 1: Data Management in Cloud Workflow Systems Dong Yuan

Data Management in Cloud Workflow Systems

Dong Yuan

Faculty of Information and Communication Technology

Swinburne University of Technology

Page 2: Data Management in Cloud Workflow Systems Dong Yuan

Outline

> Cloud Computing & Cloud Workflow Systems

– Introduction to cloud workflow systems. A brief overview of grid workflow systems.

> Data Management in Cloud Workflow Systems

– New features and research issues

> Cloud Computing Environment and SwinDeW-C

– Our simulation environment and cloud workflow system

Page 3: Data Management in Cloud Workflow Systems Dong Yuan

> Cloud Computing & Cloud Workflow Systems

Page 4: Data Management in Cloud Workflow Systems Dong Yuan

Cloud Computing

> Some new features of cloud computing

– Large data centres with cheap hardware

– Virtualisation

– Internet based and SOA

• SaaS, PaaS, IaaS

– Market driven and cost model

> Research of cloud computing has emerged in many areas

– Data mining, Database, Parallel computing & Scientific application, Content delivery

Page 5: Data Management in Cloud Workflow Systems Dong Yuan

Cloud Workflow Systems

> Grid workflow systems

– Kepler, Pegasus, Taverna, MOTEUR, Triana, ASKALON

– Gridbus, GridFlow

> Build-time: focus on data modelling.

– Kepler: actor-oriented data modelling. Taverna - Sculf. ASKALON - AGWL

> Runtime: adopt Data Grid system

– Grid DataFarm, GDMP, GridDB, SRB, RLS (P-RLS), GSB, DaltOn

Page 6: Data Management in Cloud Workflow Systems Dong Yuan

Cloud Workflow Systems

> Architecture

– Based on Internet

– Platform as a Service

– More distributed

Unified Resources

Fabric

Platform

Web Portal

User

Workflow Application

Workflow Specification

Cloud Service

Virtual Machine

Cloud Service Cloud

Service

Cloud Service

Cloud Service

Local Data CentreGlobalCloud

Cloud Service ProviderCloud

Service Provider

Cloud Service Provider

Cloud Service Provider

Page 7: Data Management in Cloud Workflow Systems Dong Yuan

> Data Management in Cloud Workflow Systems

Page 8: Data Management in Cloud Workflow Systems Dong Yuan

Data Management in Cloud Workflow Systems

> New features and challenges– Independent of users and automatic

– Cost driven

• computation cost, storage cost, data transfer cost

– Data dependency

• Task – data, data – data, derivation

> Some research issues– Data partition, placement, replication, synchronisation,

provenance, catalogue, meta-data, consistence, reduction, storage, movement, etc.

Page 9: Data Management in Cloud Workflow Systems Dong Yuan

Data Placement in Cloud Workflow Systems

> Data Placement: to decide where to store the application data in the distributed data centres

> Aims:

– Reduce data movement

– Reduce task waiting time

> Strategy:

– Data dependency: dataset – dataset

– Build-time: existing data, runtime: generated data (also intermediate data)

Page 10: Data Management in Cloud Workflow Systems Dong Yuan

Data Replication in Cloud Workflow Systems

> Data replication: for one dataset, store several copies in different places (data centres)

> Aims:

– Increase data security

– Fast data access

– Reduce data movement

> Strategy:

– Dynamic replication.

Page 11: Data Management in Cloud Workflow Systems Dong Yuan

Intermediate Data Storage in Cloud Workflow Systems

> Intermediate data storage is especially importance in scientific workflows

> Aim:

– Reduce system cost

> Strategy:

– Intermediate data can be regenerated with data provenance information

– Selectively store some key intermediate datasets

Page 12: Data Management in Cloud Workflow Systems Dong Yuan

> Cloud computing environment and SwinDeW-C

Page 13: Data Management in Cloud Workflow Systems Dong Yuan

Simulation Cloud

Swinburne Cluster

VMware

SwinDeW-C

…... …...Physical Machines

Layer

Virtual Machines

Layer

ApplicationsLayer

Data Centres with Hadoop

Page 14: Data Management in Cloud Workflow Systems Dong Yuan

Web Portal

Page 15: Data Management in Cloud Workflow Systems Dong Yuan

Related key system components of SwinDeW-C

User Interface Module

Data Management Module

Data Placement Component

Data Replication Component

Intermediate data storage Component

Data Catalogue

Flow Management Module

Process Repository

Task Management Module

Scheduler

Resource Management Module

…...

Web PortalMonitoring Component

Uploading Component

Meta-data Management Component

Provenance Data

Collection

Page 16: Data Management in Cloud Workflow Systems Dong Yuan

End

> Questions?