Post on 15-Feb-2016
description
Yellowstone and HPSSOR
What you’re doing on Bluefire that you should stop
David HartCISL User Services Section
August 7, 2012
2
Think different• Yellowstone is not Bluefire
– Yellowstone delivers 29x the computing capacity• GLADE is not /ptmp
– New /glade/scratch (~5 PB) is • 37x larger than /ptmp• 25x larger than old /glade/scratch
– New GLADE is 7x larger, 15x faster than old GLADE
• HPSS tape capacity is not infinite
3
Tape ≠ slow disk• “Temporary” HPSS files use tape space that is
not easily reclaimed– Deleting files from tapes leaves gaps that
are not refilled (unlike disk)– “Repacking” not practical
• Time consuming, may recover only 10% of tape space, occupies tape drives
– Space is recovered only* when entire archive is migrated to new media
• Wasted tape = smaller future HPC systems
4
HPSS today
5
HPSS, May 2012—15.75 PB
NCARLabs
47%
CSL28%
Labels show:Entity, PB, %
6
HPSS Growth May 2011-May 2012
• 12.8 PB (May 2011) 15.75 PB (May 2012)– +3 PB in one year, ~23% growth overall– ~70 TB added every week
• Largest increases– CGD: 741 TB– CESM: 624 TB– RDA: 374 TB– University: 302 TB– RAL: 295 TB
• NCAR Lab holdings grew 1.6 PB– Excluding CESM and other CSL activity – From 5.96 PB to 7.58 PB (+27%)
7
HPSS by 2014Hit the wall running…
9
Potential HPSS growth by Jan 2014
~15 months!
10
HPSS allocations• CISL’s new accounting
system– Lets us set HPSS
allocations– Helps you more easily
monitor your holdings• We reduced allocations
for CSL awardees and CHAP awardees
• We will set a “budget” for NCAR labs, too
11
HPSS holdings, Jan 2014 (projected)
• 30+ PB data• 200M+ files
12
Action items
13
1. Cleaning house• USERS: Opportune time to delete old files
– CISL will eventually migrate current holdings to new media– Help us avoid migrating unnecessary data– Closing old projects will provide you with details about
files associated with those projects• CISL: Convert dual-copy files to single-copy
– Recovers ~3 PB of space, mostly older MSS files (where dual-copy was the default).
– Since moving to HPSS, net amount of dual-copy data has decreased by 44 TB
14
Limits to HPSS deletionHPSS holdings by year, in PB. Category amounts estimated.This represents upper limit of possible deletions, since some
files may have been removed.
15
2. Manual second copies• Eliminating dual-copy class of service in favor of
backup area for user-managed second copies– Currently the approach used by Research Data
Archive (RDA)– Advantages:
• Guarantees second copy is on different media• Reduces confusion on dual-copy limitations• Protects against user error (not true of 2-copy CoS!)
– Removes, overwrites of original won’t clobber second copy• Changing your mind consumes less tape this way
– And less cost!
16
3. Think “archive”• “A long-term storage area, often on magnetic tape, for
backup copies of files or for files that are no longer in active use.”– American Heritage Dictionary
• “Records or documents with historical value, or the place where such records and documents are kept.”
• “To transfer files to slower, cheaper media (usually magnetic tape) to free the hard disk space they occupied. … [I]n the 1960s, when disk was much more expensive, files were often shuffled regularly between disk and tape.”– Free On-Line Dictionary of Computing
17
Updated GLADE policies for Yellowstone
• /glade/scratch (5 PB total)– 90-day file retention from last access– 10 TB quota default
• If you need more, ask.– Use it!
• Use responsibly! Don’t let large piles of data sit, untouched, for 88 days
• We will rein in the 90 days, if needed.• /glade/work (1 PB total)
– 500 GB quota default for everyone– No purging or scrubbing!
18
Optimize your workflows
• Don’t use tape for data/files you know are temporary or interim
• Plan ahead• Leave temporary data in /glade/scratch • Post-process to final form before archiving• Take advantage of LSF-controlled Geyser
and Caldera to automate post-processing tasks
19
4. Monitor off-site data• HPSS sizing and plans are estimated based on
size of CISL’s production HPC resources• Not for NCAR data production on Hopper,
Intrepid, Jaguar, Kraken, Pleiades, Blue Waters, Stampede …– HPC sites are pushing the data problem around
• Projects, labs need to be aware of their users’ data migration to NCAR from off-site– Factor this into local data management plans
20
5. Plan ahead• CISL working with B&P on whether to formalize
tape storage needs and costs associated with proposal activity
• Most important for projects with plans to store “significant” amounts of data– How much is “significant” is TBD, but amounts that
can be described in tenths of petabytes or more probably qualify.
– If this applies to you, CISL can provide a cost for tape storage to include in your co-sponsorship budget.
21
Looking ahead…1 year• GLADE will expand by ~5 PB in Q1 2014• How would you like to take advantage of the new
disk?– Near-term, online backup
• E.g., an area for 6-month “insurance” copies– Longer scratch retention– Larger permanent “work” space– Other ideas?
• HPSS procurement for next-generation archive in the planning stages
22
Looking ahead…3-4 years
• Recap: Yellowstone may lead to 25+ PBper year stored in HPSS
• The successor to Yellowstone may be 10+ times more powerful– Anywhere from 15-40 Pflops, likely with GPU, MIC,
or other many-core accelerators• Can we afford to maintain and manage
10x the HPSS storage?– 250 PB per year — 0.25 Exabyte per year?
23
Questions?