CASJobs: A Workflow Environment Designed for Large Scientific Catalogs

CASJOBS: A WORKFLOW ENVIRONMENT DESIGNED FOR LARGE SCIENTIFIC CATALOGS

Nolan Li, Johns Hopkins University

What is CASJobs

Terabytes of scientific data Web based system

Data distribution Server-side analysis Optimize user work patterns Server-side user storage and

programmability

Sloan Digital Sky Survey (SDSS) Astronomical Survey

Images (fits) - 15.7 TB

Other data products ( masks, jpeg images, etc.) (DAS, fits format) - 26.8 TB

Catalogs (CAS, SQL database) - 18 TB

Data is public Delivery?

Database

Bandwidth is expensive!

10 terabytes is big! So database it

(SkyServer) Partial delivery Move work to data

Scalability Traffic++ Complexity ++ Data++

So… Cap execution time Cap results Build something else

Monthly CAS Usage

1.E+04

1.E+05

1.E+06

1.E+07

Web Hits

SQL Queries

CASJobs

Catalog Archive Server Jobs Server-side user storage and programmability

MyDB Hardware abstraction and long-term query

portability Contexts

Complete, automatic query logging Scalable performance

Controlled asynchronous query execution Data sharing

Groups http://casjobs.sdss.org/casjobs

Server-side user database

Intermediate storage

Data import User

programmable

SELECT *FROM DR4WHERE a.objid = 38573498OR a.objid = 92837451OR a.objid = 20394833OR a.objid = 90284723

SELECT *FROM DR4 a, MyDB.MyTable bWHERE a.objid = b.objid

Logging

Automatically log all user queries Resubmit old queries Reconstruct database objects

Contexts

Databases are identified by their data, not their location

Queries are independent of hardware configuration

SELECT TOP 10 *FROM [server].[catalog].[user].MyTable

SELECT TOP 10 *FROM DR4.MyTable

Quick Jobs

Executes right away

But not for very long

Restricted memory usage

For things like… How many objects

? Table previews Preliminary

queries System queries

Long Jobs

Asynchronous Less restricted

execution time Storage capped

by MyDB size

For things like… Heavy IO Heavy

computation

Groups

Non exclusive sets of CASJobs users

Share data Keep more work

at the data

SELECT *FROM myGroup.otherUser.theirTable

Hardware

Flexible configuration

1+ machine per context (non exclusive)

1+ machine for MyDBs

Interface

Web Site Web Services

> two million jobs > 2200 users Astro deployments

Galaxy Evolution Explorer (GALEX)

Palomar Quest Panoramic Survey

Telescope and Rapid Response System (Pan-STARRS)[3].

Non Astro deployments Ameriflux Swiss Institute of

Bioinformatics (ISB) 8/29

100000

150000

200000

250000

Monthly CASJobs

CASJobs: A Workflow Environment Designed for Large Scientific Catalogs

Documents

Transcript of CASJobs: A Workflow Environment Designed for Large Scientific Catalogs

Advances in Scientific Workflow Environments

UCSD SAN DIEGO SUPERCOMPUTER CENTER Ilkay Altintas Scientific Workflow Automation Technologies Provenance Collection Support in the Kepler Scientific Workflow.

Bringing Scientific Workflow to the Masses via Pegasus … · Bringing Scientific Workflow to the Masses via ... Derrick Kearney, Carol Song Purdue University West Lafayette, IN,

Orchestrating Scientific Workflows with Maestro Workflow ...

Data Provenance and Scientific Workflow Management

2016-10-20 BioExcel: Advances in Scientific Workflow Environments

Scientific Workflow Systems for ... - University of Chicago

Presentation an approach for scientific workflow distribution on

SWARM: A Scientific Workflow for Supporting Bayesian ... · PDF fileSWARM: A Scientific Workflow for Supporting Bayesian Approaches to Improve Metabolic Models Xinghua Shi1 1Department

Thermo Scientific molecular biology workflow solutionsbeta-static.fishersci.com/content/dam/fishersci/en_US/... · 2020. 4. 3. · Thermo Scientific molecular biology workflow solutions

A Logic Programming Approach to Scientific Workflow Provenance Querying*

Scientific Data & Workflow Engineeringusers.sdsc.edu/~ludaesch/Paper/ludaescher-nsf-11-04.pdf · Scientific Data & Workflow Engineering ... CI Sample Architectures ... Source Contextualization

Clocks, Watches & Scientific Instruments - Skinner,assets.skinnerinc.com/pdf/catalogs/3035m.pdf · Clocks, Watches & Scientific Instruments. Jonathan Dowling ... dial with applied

Detecting Duplicate Records in Scientific Workflow Results

Usability Study of the Taverna Scientific Workflow Workbench

KEPLER Scientific Workflow System

Introduction to the Kepler Workflow System · •Goals • Produce an open-source scientific workflow system • enable scientists to design, share, and execute scientific workflows

The Future of Scientific Workflow - National Energy … Future of Scientific Workflow Michael(Wilde(((wilde@anl.gov(Argonne(Naonal(Laboratory(and(The(University(of(Chicago(Collaborators

CASJOBS: A WORKFLOW ENVIRONMENT DESIGNED FOR LARGE SCIENTIFIC CATALOGS Nolan Li, Johns Hopkins University.

Programming Scientific and Distributed Workflow with Triana Services Matthew Shields, GGF10 Workflow Workshop, 9th March.