Scheduled Scientific Data Releases Using .backup Volumes

18
May 23rd 2008 Chris Kurtz Zach Schimke Mars Space Flight Facility Arizona State University Scheduled Scientific Data Releases Using .backup Volumes

description

How the Mars Space Flight Facility uses (and abuses) the .backup snapshot feature of OpenAFS. This presentation was given by Chris Kurtz and myself at the 2008 AFS and Kerberos Best Practices Workshop in Newark, NJ.

Transcript of Scheduled Scientific Data Releases Using .backup Volumes

Page 1: Scheduled Scientific Data Releases Using .backup Volumes

May 23rd 2008

Chris Kurtz

Zach SchimkeMars Space Flight Facility

Arizona State University

Scheduled Scientific

Data Releases

Using .backup Volumes

Page 2: Scheduled Scientific Data Releases Using .backup Volumes

2

Outline

Introduction: The Mars Space Flight Facility

Spacecraft Data and You

Image Processing

The Problem: Released and Unreleased Data

The Solution: AFS and .backups

Overview of MSFF use of AFS

Feature Requests

Questions

Page 3: Scheduled Scientific Data Releases Using .backup Volumes

3

Introduction

NASA/Jet Propulsion Lab funded research institution

Scientists, Mission Planners, Developers, SysAdmins

Four instruments on Mars:

TES (Thermal Emission Spectrometer)

Mars Global Surveyor (1996-2006)

THEMIS (THermal EMission Imaging System)

Mars Odyssey (2001 to current)

Mini-TES

MER Rovers Spirit and Opportunity (2004 to current)

Over 80 TB of collected mission data (including AFS)

Page 4: Scheduled Scientific Data Releases Using .backup Volumes

4

Spacecraft Data and You

Instrument captures data on Mars

Spacecraft combines data from all instruments, adds

spacecraft telemetry, and sends to Earth via radio to

be received by the DSN (Deep Space Network)

JPL correlates, decodes, and packages data for each

instrument

MSFF pulls the raw data for its instrument from JPL

MSFF processes the data through multiple steps

Page 5: Scheduled Scientific Data Releases Using .backup Volumes

5

Spacecraft Data and You: THEMIS Data Types

IR NIGHT

IR DAY

VISIBLE

THEMIS Data Types:

Infrared (IR)

100m per pixel

Daytime and

Nighttime Images

Visible Light (VIS)

18m per pixel

Page 6: Scheduled Scientific Data Releases Using .backup Volumes

6

Image Processing

Raw

(EDR)

Calibrated

(RDR)Projected

(GEO)

SFDU: Standard Formatted

Data Unit

EDR: Experiment Data

Record

RDR: Reduced Data Record:

GEO: Geometrically

Registered Record

(2x) (4x)

Page 7: Scheduled Scientific Data Releases Using .backup Volumes

7

Image Processing

Due to the volume of data, two 100-CPU Linux

clusters are used for processing and the resulting

products are stored on a high-end NFS server from

Network Appliance

These data products are made available to Science

Team members immediately via authenticated

services

JPL contract requires data to be released to the public

6 months after being received (to give Operations time

to validate, calibrate, process, perform scientific

analysis, etc) – this is the crux of the problem

Page 8: Scheduled Scientific Data Releases Using .backup Volumes

8

Image Processing

Snow and Ice in Udzha Crater

(VIS – False Color)

Image Credit: NASA/JPL/ASU

Page 9: Scheduled Scientific Data Releases Using .backup Volumes

9

Image Processing

Hemetite in MeridianiPlanum

(IR – False Color)

Image Credit: NASA/JPL/ASU

Page 10: Scheduled Scientific Data Releases Using .backup Volumes

10

The Problem: Released and Unreleased Data

There is a 6-month grace period between data

collection and public release

Previous methodology was to copy over 25 TB of data

via rsync from internal NFS to stand-alone web

server(s)...This had issues:

It took forever just to build the file list

The rsync itself took days

Releases took longer and longer (we regularly re-

process old data with updated calibration, so have to re-

release)

Webservers needed fast, expensive, redundant disk

Page 11: Scheduled Scientific Data Releases Using .backup Volumes

11

The Solution: AFS and .backup

Data is moved from expensive NFS to cheap AFS

AFS excels at storing large amounts of Read Only

data redundantly and at reasonable costs

AFS snapshot backups allow us to keep public data

public and private data private

Page 12: Scheduled Scientific Data Releases Using .backup Volumes

12

The Solution: NFS vs AFS

NFS (Network Appliance)

High Speed (Trunked GigE)

High throughput (100,000 ops/sec)

Redundant (Mod. RAID4, clustered servers)

EXPENSIVE!!! ($5000 per TB)

AFS (CentOS Linux Servers)

Fast RO

Slower RW

Redundant (RAID5)

Cheap! (Less than $1000 per TB)

VS

Page 13: Scheduled Scientific Data Releases Using .backup Volumes

13

The Solution: .backup volumes

AFS .backup volumes are point-in-time copies that are

independent of the original volume (a “reverse delta”) -

since the original volume can be altered without

affecting the .backup, this is useful!

New methodology:

All volumes of released data have a .backup volume

created using standard tools (vos backup)

Website references backup volume names

This new process takes an hour or two (depending on

how many new .backup volumes are created)

Process moved from SysAdmins to Operations

Page 14: Scheduled Scientific Data Releases Using .backup Volumes

14

MSFF and OpenAFS

Once processed, data is stored in AFS in 100-orbit

“chunks” (afs volumes) according to various data

types, such as “themis.RDR.V284XXRDR” (THEMIS

instrument container volume, RDR container volume,

Visible Camera orbits 28400-28499 RDRs)

Co-Investigators at other Universities access the data

via authenticated AFS, FTP, and website as it is

proprietary...for a while

Public access via web, ftp, and AFS

Page 15: Scheduled Scientific Data Releases Using .backup Volumes

15

MSFF OpenAFS Specifics

Cell: mars.asu.edu

AFS DB servers are Xen virtual machines

Servers:

8 AFS File Servers

CentOS 5.1 (formerly Fedora Core 4)

15,000 volumes / 35 Tb of AFS storage (RAID 5)

4000 read/write volumes (8000 .readonly)

3500 .backup

Nagios monitoring of BOS, Disk Space, rxdebug

Page 16: Scheduled Scientific Data Releases Using .backup Volumes

16

Feature Requests

Additional snapshot capability besides .backup

At least one .snapshot, but more would be nicer.

File Server implied ACLs for this .snapshot

Volume Autorelease

Built-in Mechanism to automatically release volumes.

Better VOS granularity

Allow users to release specific volumes or volume sets

rather than it being all or nothing.

(Open)LDAP support for PT Server

Better cron support (mostly solved by k5start)

Page 17: Scheduled Scientific Data Releases Using .backup Volumes

17

Questions

Gusev Crater (VIS – False Color)

Image Credit: NASA/JPL/ASU, Mars Express HSRC Camera, ESA/DLR/FU Berlin (G. Neukum)

Page 18: Scheduled Scientific Data Releases Using .backup Volumes

18

Final Remarks

Utopia Plains(IR/VIS – False Color)

Image Credit: NASA/JPL/ASU