About the need of 3 Tier storage

17
Grid Lab About the need of 3 Tier storage 5/22/12 1 CHEP 2012, The need of 3 Tier storage Dmitri Ozerov Patrick Fuhrmann CHEP 2012, NYC, May 22, 2012 Grid Lab Partially evaluated by

description

About the need of 3 Tier storage. Dmitri Ozerov Patrick Fuhrmann. CHEP 2012, NYC, May 22, 2012. Partially evaluated by. Grid Lab. Content. Profile of a “large” WLCG Tier 2. Efficiency and performance of Storage Media. Replaying billing information. Simple cache filling mechanism. - PowerPoint PPT Presentation

Transcript of About the need of 3 Tier storage

Page 1: About the need of 3 Tier storage

Grid Lab

CHEP 2012, The need of 3 Tier storage 1

About the need of 3 Tier storage

5/22/12

Dmitri OzerovPatrick Fuhrmann

CHEP 2012, NYC, May 22, 2012

Grid Lab

Partially evaluated by

Page 2: About the need of 3 Tier storage

Grid Lab

CHEP 2012, The need of 3 Tier storage 2

• Profile of a “large” WLCG Tier 2.• Efficiency and performance of Storage Media.• Replaying billing information.• Simple cache filling mechanism.• Improved mechanism.• Summary

5/22/12

Content

Page 3: About the need of 3 Tier storage

Grid Lab

CHEP 2012, The need of 3 Tier storage 3

About the need of 3 Tier storage

5/22/12

SpinningDisk

SSD

Tertiary Storage (Tape)

SpinningDisk

Tertiary Storage (Tape)

Page 4: About the need of 3 Tier storage

Grid Lab

CHEP 2012, The need of 3 Tier storage 45/22/12

Our evaluation results are most relevant for storage providers of Tier 2/3.

The WLCG 3 Tier Model

SpinningDisk

Tertiary Storage (Tape)

SpinningDisk

Tier 0 / 1

Tier 2

Page 5: About the need of 3 Tier storage

Grid Lab

CHEP 2012, The need of 3 Tier storage 55/22/12

Activities of a large Tier 2

• Most of the clients are using dCap. • Data is read directly from storage.• We found a significant difference in transfer speed between dCap and gridFTP.

(From few MB/Sec to 30 MB/s due to different access pattern)

Page 6: About the need of 3 Tier storage

Grid Lab

CHEP 2012, The need of 3 Tier storage 65/22/12

Partially reading files

• 870 TB of data was transferred over the network; files were only

partially read.

• The transferred data corresponds to 24 PB, if the entire file would

have been transferred over the wire.

• The average network bandwidth: 0.33 GB/sec -> 9 GB/sec

• Load on the storage moves from dedicated effective storage pools

to local space on WN

Is it more efficient to copy data to WN storage, to be processed locally ?

CMS User group statistics : Nov 2011

Page 7: About the need of 3 Tier storage

Grid Lab

CHEP 2012, The need of 3 Tier storage 75/22/12

Efficiency of storage hardware

Investigations were made with the DESY Grid Lab facility

(For details, please see our poster 503)

Processing I/O intensive ATLAS Hammercloud jobs revealed two limitations:

50 Jobs / 100 TB – efficiency of jobs drops by 30%

Running more than 80 Jobs / 100TB is a waste of CPU resources.

Page 8: About the need of 3 Tier storage

Grid Lab

CHEP 2012, The need of 3 Tier storage 85/22/12

Storage media performance

1. Traditional Disk (HDD) – good in streaming, bad in random reads SSD -> 10 times faster in random reading

2. Copying the entire file to the WN instead of directly accessing the data will be affected significantly due to the increased number of reads.

Page 9: About the need of 3 Tier storage

Grid Lab

CHEP 2012, The need of 3 Tier storage 95/22/12

SSD

SpinningDisk

Simple Strategy

1. Get file-A from disk2. Asynchronously copy the

same file to SSD storage, removing “least recently used” files in case space is running short.

SSD

SpinningDisk

1. Next read will get file-A directly from fast SSD storage as long as file is still present.

Page 10: About the need of 3 Tier storage

Grid Lab

CHEP 2012, The need of 3 Tier storage 105/22/12

How was the analysis done ?

DESYStorageElement

Billing FileOr

Billing DB

9:00 transfer 100.009:00 transfer 100.009:00 transfer 100.009:00 transfer 100.00

9:00 transfer 100.00

Records eachsingle transfer

with time, file size etc.

SimulationOf

Variousscenarios

Page 11: About the need of 3 Tier storage

Grid Lab

CHEP 2012, The need of 3 Tier storage 115/22/12

Result:

• 20 Tb Cache Size : 70% of traffic to clients is originated from SSD. • 380TB was streamed in total (with 140 Mbyts/sec )

Re-play of Storage System billing info

1st November 2011:

The cache is completely empty

• Read file from cache if present• If not -> read from disk:

copy to cache if no space left, remove "least accessed"

file.

Concerning bytes transferred

Page 12: About the need of 3 Tier storage

Grid Lab

CHEP 2012, The need of 3 Tier storage 125/22/12

Re-play of Storage System billing info

• Read file from cache if present• If not -> read from disk:

copy to cache if no space left, remove "least accessed"

file.

Result:

• 20 Tb Cache Size : 70% of traffic to clients is originated from SSD. • 380TB was streamed in total (with 140 Mbyts/sec )• Only 4% of reads from SSD

1st November 2011:

The cache is completely empty

Concerning # of reads

Page 13: About the need of 3 Tier storage

Grid Lab

CHEP 2012, The need of 3 Tier storage 135/22/12

Agnostic against abnormal activity

To prove that the evaluation has not been mislead by abnormal activities in the observed time frame (e.g. user code bugs or specific jobs only seen those months) an additional month was analyzed.

Page 14: About the need of 3 Tier storage

Grid Lab

CHEP 2012, The need of 3 Tier storage 145/22/12

Results from a less active month

With 20 TB of cache, 80% of reads are redirected to the cache

Page 15: About the need of 3 Tier storage

Grid Lab

CHEP 2012, The need of 3 Tier storage 155/22/12

SSD

SpinningDisk

Improved Strategy

1. Get file-A from disk2. Read the file up to 10

times3. Asynchronously copy the

same file to SSD storage with the 10th read

SSD

SpinningDisk

1. Next read will get file-A directly from fast SSD storage

Copywith 10th

access

Page 16: About the need of 3 Tier storage

Grid Lab

CHEP 2012, The need of 3 Tier storage 165/22/12

Copy to SSD cache after 10 file accesses

With the improved (tunable) algorithm, we are able to even reduce data transfer, while keeping high cache access rates.

Page 17: About the need of 3 Tier storage

Grid Lab

CHEP 2012, The need of 3 Tier storage 175/22/12

Summary

• Storage systems based on traditional spinning disks have a limited I/O performance.

• Adding SSDs, as an additional cache layer in currently deployed storage solutions (DPM, dCache..) can boost I/O performance significantly.

• An analysis of CMS user activities has shown that only 10% of cache, results in 70% of random reads from the cache.

• A storage system with 3 Tiers is transparent for the user and cost effective for sites.