A Workflow-Aware Storage System

26
A Workflow-Aware Storage System Emalayan Vairavanathan 1 Samer Al-Kiswany, Lauro Beltrão Costa, Zhao Zhang, Daniel S. Katz, Michael Wilde, Matei Ripeanu

description

Emalayan Vairavanathan. A Workflow-Aware Storage System. Samer Al- Kiswany , Lauro Beltrão Costa, Zhao Zhang, Daniel S. Katz, Michael Wilde, Matei Ripeanu. Workflow Example - ModFTDock. Protein docking application Simulates a more complex protein model from two known proteins - PowerPoint PPT Presentation

Transcript of A Workflow-Aware Storage System

Page 1: A Workflow-Aware Storage System

1

A Workflow-Aware Storage System

Emalayan Vairavanathan

Samer Al-Kiswany, Lauro Beltrão Costa, Zhao Zhang, Daniel S. Katz, Michael Wilde, Matei Ripeanu

Page 2: A Workflow-Aware Storage System

2

Workflow Example - ModFTDock

• Protein docking application

• Simulates a more complex protein model from two known proteins

• Applications

Drugs design

Protein interaction prediction

Page 3: A Workflow-Aware Storage System

Background – ModFTDock in Argonne BG/P

3

Backend file system (e.g., GPFS, NFS)

Scale: 40960 Compute nodes

File based communication

Large IO volumeWorkflow Runtime

Engine

1.2 M Docking

Tasks

IO rate : 8GBps= 51KBps / core

App. task

Local storage

App. task

Local storage

App. task

Local storage

App. task

Local storage

App. task

Local storage

Page 4: A Workflow-Aware Storage System

4Source [Zhao et. al]

Background – Backend Storage Bottleneck

• Storage is one of the main bottlenecks for workflows

Montage workflow (512 BG/P cores, GPFS backend file system)

Data manage-ment30%

Execution29% Scheduling and

Idle 40%

Page 5: A Workflow-Aware Storage System

Intermediate Storage Approach

5Backend file system (e.g., GPFS, NFS)

App. task

Local storage

App. task

Local storage

App. task

Local storage

Intermediate Storage

POSIX API

Workflow Runtime

EngineScale: 40960 Compute nodes

Stage In

Stage Out

Source [Zhao et. al] MTAGS 2008

Page 6: A Workflow-Aware Storage System

6

Research Question

How can we improve the storage performance for workflow applications?

Page 7: A Workflow-Aware Storage System

7

IO-Patterns in Workflow Applications – by Justin Wozniak et al PDSW’09

• Pipeline

• Broadcast

• Reduce

• Scatter

and Gather

Locality andlocation-aware scheduling

Replication

Collocation and location-aware scheduling

Block-level data placement

Page 8: A Workflow-Aware Storage System

IO-Patterns in ModFTDock

• 1.2 M Dock, 12000 Merge and Score instances at large run• Average file size 100 KB– 75 MB

Stage - 1Broadcast

pattern

Stage - 2Reduce pattern

Stage - 3Pipelinepattern

8

ModFTDock

Page 9: A Workflow-Aware Storage System

9

Research Question

How can we improve the storage performance for workflow applications?

Workflow-aware storage: Optimizing the storage for IO patterns

Our Answer

Traditional approach: One size fits allOur approach: File / block-level optimizations

Page 10: A Workflow-Aware Storage System

10

Integrating with the workflow runtime engine

Backend file system (e.g., GPFS, NFS)

Workflow Runtime

Engine

App. task

Local storage

App. task

Local storage

App. task

Local storage

Workflow-aware storage (shared)

Compute Nodes

Stage In/Out

Storage hints(e.g., location information)

Application hints (e.g., indicating access patterns)

POSIX API

Page 11: A Workflow-Aware Storage System

11

Outline

• Background

• IO Patterns

• Workflow-aware storage system: Implementation

• Evaluation

Page 12: A Workflow-Aware Storage System

12

Implementation: MosaStore

• File is divided into fixed size chunks.

• Chunks: stored on the storage nodes.

• Manager maintains a block-map for each file

• POSIX interface for accessing the system

MosaStore distributed storage architecture

Page 13: A Workflow-Aware Storage System

13

Implementation: Workflow-aware Storage System

Workflow-aware storage architecture

Page 14: A Workflow-Aware Storage System

14

Implementation: Workflow-aware Storage System

• Optimized data placement for the pipeline pattern

Priority to local writes and reads

• Optimized data placement for the reduce pattern

Collocating files in a single storage node

• Replication mechanism optimized for the broadcast pattern

Parallel replication

• Exposing file location to workflow runtime engine

Page 15: A Workflow-Aware Storage System

15

Outline

• Background

• IO Patterns

• Workflow-aware storage system: Implementation

• Evaluation

Page 16: A Workflow-Aware Storage System

16

Evaluation - Baselines

MosaStore, NFS and Node-local storage

vs Workflow-aware storage

Backend file system (e.g., GPFS, NFS)

App. task

Local storage

App. task

Local storage

App. task

Local storage

Intermediate storage (shared)

Compute Nodes

Stage In/Out

MosaStore

NFS

Local storage

Workflow-aware storage

Page 17: A Workflow-Aware Storage System

17

Evaluation - Platform

• Cluster of 20 machines. Intel Xeon 4-core, 2.33-GHz CPU, 4-GB RAM, 1-Gbps NIC, and a RAID-

1 on two 300-GB 7200-rpm SATA disks

• Backend storage NFS server Intel Xeon E5345 8-core, 2.33-GHz CPU, 8-GB RAM, 1-Gbps NIC, and

a 6 SATA disks in a RAID 5 configuration

NFS server is better provisioned

Page 18: A Workflow-Aware Storage System

18

Evaluation – Benchmarks and Application

Synthetic benchmark

Application and workflow run-time engine ModFTDock

Workload Pipeline Broadcast Reduce

Small 100KB, 200KB, 10KB 100KB, 1KB 10KB, 100KB

Medium 100 MB, 200 MB, 1MB 100 MB, 1MB 10MB, 200 MB

Large 1GB, 2GB, 10MB 1 GB, 10 MB 100MB, 2 GB

Page 19: A Workflow-Aware Storage System

19

Synthetic Benchmark - Pipeline

Average runtime for medium workload

Optimization: Locality and location-aware scheduling

Page 20: A Workflow-Aware Storage System

20

Synthetic Benchmarks - Reduce

Optimization: Collocation and location-aware scheduling

Average runtime for medium workload

Page 21: A Workflow-Aware Storage System

Synthetic Benchmarks - Broadcast

21

Optimization: Replication

Average runtime for medium workload

Page 22: A Workflow-Aware Storage System

22

Not everything is perfect !

Average runtime for small workload (pipeline, broadcast and reduce benchmarks)

Page 23: A Workflow-Aware Storage System

23

Evaluation – ModFTDock

ModFTDock workflow

Total application time on three different systems

Page 24: A Workflow-Aware Storage System

24

Evaluation – Highlights

• WASS shows considerable performance gain with all the benchmarks on medium and large workload (up to 18x faster than NFS and up to 2x faster than MosaStore).

• ModFTDock is 20% faster on WASS than on MosaStore, and more than 2x faster than running on NFS.

• WASS provides lower performance with small benchmarks due to metadata overheads and manager latency.

Page 25: A Workflow-Aware Storage System

25

Summary

Problem• How can we improve the storage performance for workflow

applications?

Approach• Workflow aware storage system (WASS)

From backend storage to intermediate storage Bi-directional communication using hints

Future work• Integrating more applications• Large scale evaluation

Page 26: A Workflow-Aware Storage System

26

THANK YOUMosaStore: netsyslab.ece.ubc.ca/wiki/index.php/MosaStore

Networked Systems Laboratory: netsyslab.ece.ubc.ca