1 HST Pipeline Project Review March 14, 2003. 2 Review Objectives Re-familiarize Project (and...
-
date post
20-Dec-2015 -
Category
Documents
-
view
212 -
download
0
Transcript of 1 HST Pipeline Project Review March 14, 2003. 2 Review Objectives Re-familiarize Project (and...
1
HST PipelineProject Review
March 14, 2003
2
Review Objectives
Re-familiarize Project (and others) with production data processing done by STScI
Familiarize everyone with new processing hardware and how we plan to use it
Describe the steps we will be taking to shift development, I&T, and production data processing from the old systems to the new systems
3
Introduction
History – Long View History – Last Year Data processing requirements Goals of this project Overall plan
4
What do we mean by “data processing” ? Receipt of science and engineering data Reformatting, quality checking, calibration, etc.
needed to prepare data for the archive Archiving the data Retrieving the data Processing and calibration of retrieved data Sending data off to the user User access tools
5
History – Long View
Original plan (1981) TRW provides OSS and PODPS as two of three major
pieces of SOGS OSS to be used for real-time decision making PODPS to process science data, including calibration, for users
STScI provides SDAS (analysis tools) Established FITS format as basic science data format
Data provided to users on 1600/9600 bpi tapes No archive
6
History – Long View
Pre-launch changes (1981-1990) Astrometry and Engineering Data to come to STScI PODPS to run STSDAS-based calibrations STScI to develop CDBS (calibration data base system) Archive activities started
STScI developed DMF, a prototype optical disk based archive, pressed into service ~ L+ 1 year
DADS development started at Loral StarView development started at STScI
7
History – Long View
Post-Launch changes I (1990-1996) DADS delivered, data transferred from DMF to DADS
Starview released
OMS developed for engineering data and jitter files OPUS replaced OSS and PODPS
Consolidated software systems Important technology upgrade to support future growth Pipeline development for STIS and NICMOS started
8
History – Long View Post-Launch Changes II (1996 – 2001)
Data volume doubled with STIS and NICMOS Archive utilization increased substantially UNIX version of OPUS developed for FUSE Archive upgraded
Magneto-Optical media replaced Optical Disks NSA project opened DADS architecture to multiple storage
media Spinning disks considered, but judged too expensive
CDBS re-implemented OTFR deployed
Reduced archive volume Provided up-to-date calibrations to users
9
History – Long View
10
History – Long View Additional improvements and consolidations have
been in our plans over the last few years DADS evolution
Remove VMS dependencies Make future technology migrations easier Improve services based on community usage of HST archive
Replace OMS Remove VMS dependency Simplify system
ACS data and data processing Increased volume Drizzle algorithms for geometric correction and image co-
addition
11
History – Last Year Several parts of the system exhibited unacceptable
performance Processing of data from HST to the Archive Response time to user requests for data from Archive
Several specific causes NFS mount problems Disk corruption in OPUS Jukebox problems Other specific hardware problems
Symptomatic of more general problems with the data processing systems
12
History – Last Year
13
History – Last Year
14
History – Last YearDADS Downtime Statistics
0.00%
2.00%
4.00%
6.00%
8.00%
10.00%
12.00%
14.00%
16.00%
18.00%
20.00%
May-02 Jun-02 Jul-02 Aug-02 Sep-02 Oct-02 Nov-02 Dec-02 Jan-03
% D
ow
nti
me
Goal: <5%
(1 day/week = 14%)
15
History – Last Year Immediate steps were taken to upgrade available
hardware Added 6 CPUs and memory to Tru64 systems Added CPUs and memory to Sun/Solaris systems Added and reconfigured disk space
Large ACS data sets moved to an ftp site to avoid load on archive system EROs and GOODs data sets Ftp site off-loaded ~10GBytes/day from archive in last
several months (~20% effect)
16
Current status
System keeping up with demands Running ~50% capacity on average Loading in various places is very spiky
Instability of system, and diversion of resources, has put delivery of data to ECF and CADC substantially behind schedule
Expect load to increase in spring as ACS data become non-proprietary
17
ECF, CADC, NAOJ Data Transfer
0
10
20
30
40
50
60
70
80
90
Date
GB
Sh
ipp
ed
NAOJ
POD
ECF/CADC
18
Bulk distribution backlog In absolute numbers: ~40,000 POD files Archive Branch does not believe the current AutoBD can
keep up with the current data volume, much less catch up. Implement ftp tool to augment transfer. Tool accesses data
on MO directly. May be able to bypass DADS by using safestores and development JB or stand-alone reader
Distribution re-design CADC/ECF will be included as beta test sites in parallel
operations starting ~ April 1, 2003. New engine allows operators to prioritize requests New engine supports transfer of compressed data Consolidation of operating systems should improve reliability
With all these solutions, preliminary estimate is that backlog could be eliminated in a few months
19
Data Processing Requirements Performance requirements (Astronomy
community expectations) Data volume requirements
Into system from HST Out of system to Astronomy community
Programmatic goals Fit within declining HST budget at STScI Expect archive to live beyond HST operational lifetime Expect archive will be used to support JWST
20
Performance Requirements-I
Average time from observation execution to data receipt < 1 day
Average time from observation execution to data availability in archive < 2 days
98% of data available in archive in < 3days
21
DADS Downtime Statistics
0.00%
2.00%
4.00%
6.00%
8.00%
10.00%
12.00%
14.00%
16.00%
18.00%
20.00%
May-02 Jun-02 Jul-02 Aug-02 Sep-02 Oct-02 Nov-02 Dec-02 Jan-03
% D
ow
nti
me
Goal: <5%
(1 day/week = 14%)
Performance Requirements-II
Archive availability: 95%
Median retrieval times Defined as time from request to when data is ready for
transmission. Does not include transmission time. Non-OTFR data (not recalibrated): 5 hours OTFR data (recalibrated): 10 hours
Median retrieval times Defined as time from request to when data is ready for
transmission. Does not include transmission time. Non-OTFR data (not recalibrated): 5 hours OTFR data (recalibrated): 10 hours
22
Performance Requirements-III
User support Unlimited number of registered users Support increased level of requests
Currently ~2000/month Expect to grow at 20% per year (guess) Reduce unsuccessful requests to <5%
Routinely handle highly variable demand Daily request volume varies by more than factor of 10 Insulate pre-archive processing from OTFR load
23
Data Volume Requirements-I
Data volume from HST - now Currently receive ~120 GBits/week from HST Currently ingest ~100 GBytes/week into the archive Currently handle ~2000 observations/week
Data volume from HST – after SM4 Expect ~200 GBits/week from HST Expect to ingest ~160 GBytes/week into archive Expect to handle ~2000 observations/week
24
Data Volume Requirements-II
Data distribution today More than 300 GBytes/week from archive More than 70 GBytes/week from ftp site
Data distribution projection Distribution volume determined by world-wide
Astronomy community – very unpredictable Large increase expected as Cycle 11 data become non-
proprietary Should expect 500-1000 GBytes/week in a few years
25
Programmatic Goals Reduce total cost of data processing activities Simplify hardware and network architecture
Reduce Operating Systems from 3 to 1 Terminate use of VMS and Tru64 Eliminate passing of data through various OSs
Consolidate many boxes into two highly reliable boxes Flexible allocation of computing resources
Support easy re-allocation of CPU and Disk resources among tasks
Provide simple growth paths, if needed
26Current Architecture
TRU64 SOLARISVMS
27
Programmatic Goals Provide common development, test, and
operational environments Current development and test systems cannot replicate
load of operational systems Reduce complexity of development and test
environments (drop VMS, Tru64) Improve ability to capture performance data,
metrics, etc. Current systems too diverse Difficult to transfer performance measurement on
development/test systems to operations
28
Current Development and I&T Environment
Nomad
Robbie
Scarab
Development & TestTeams
TO: InstituteCommunity
CD Burners
John (EDP-Dev)
1 TB
ODO Cluster
ToOperations
Jukebox
Barge
Corsair
Aardvark
Artichoke
Liner
Jukeboxes
Tape drive
Jboat
Canoe
Paul (IDR-Beta)
Ringo (EDP-Ops)
George (IDR-Dev)
Alpha/OpenVMS
Alpha/OpenVMS
Alpha/OpenVMS
VMSTRU64
29
EMC StorageArray
2GB Switch
SUN FIRE15K
EMC BackupManager
SUN BackupManager
HST
FROM: ScienceCommunity(thru StarView)
FUSEEngineering Team
TO: ScienceCommunity
Fatkat
Imagemaker
Fiber Channel
Ethernet 10/100Connections to
Institute Networkand Internet
ARCHC SCARAB
Ops-Jukeboxes
Corsair
Dev-Jukeboxes
DVD DrivesW DVDW DVD
W DVD
DVD DrivesW DVDW DVD
W DVD
CD Burners
CD Burners
New Architecture
SUN FIRE 15K Domain Config
Opus/Archive
To Test andDevelopmentEnvironment
7 Dynamically Re-Configurable Domains
EMC
EMC
EMC
EMC
EMC
EMC
OPUS/Archive OPS
EMC
Databases OPS
Databases OPS
Code Development
System Test
Database Test
OS/Security Test
30
Programmatic Goals Continue planned pipeline evolution
DADS Distribution redesign provides more flexibility to users and operators
Reflect advent of OTFR Reflect community utilization of the archive Provide operators more control over priority and loadings
Storing copy of Raw Data on EMC will dramatically reduce load and reliance on Jukeboxes
Ingest redesign provides opportunity to finally end the arbitrary boundary between OPUS and DADS
31
Programmatic Goals Future growth paths for HST
To first order, we expect HST to live within the capabilities of this architecture through SM4 to EOL
Input data volume will increase some, but not a lot Plan to adjust distribution techniques and user expectations to live
within the 15K/EMC resources However, we will encourage ever more and better use of HST
science data
Beyond HST End-of-Life HST data distribution would need to be revisited based on
utilization at the time (seven years from now) and progress of NVO initiatives
Architecture is planned starting point for JWST, hardware is very likely to need major upgrades
32
Remainder of the Review Architecture New hardware (Sunfire 15K, EMC)
What it is, how it works Steps to make it operational
Moving development, I&T, databases Moving operational processing
OPUS processing Raw data off Jukeboxes onto EMC Archive software upgrades