Pan-STARRS PS1 Published Science Products Subsystem

25
Pan-STARRS PS1 Published Science Products Subsystem Presentation to the PS1 Science Council August 1, 2007

description

Pan-STARRS PS1 Published Science Products Subsystem. Presentation to the PS1 Science Council August 1, 2007. What is PSPS?. Responsible for managing the catalogs of digital data PS1 PSPS will not receive image files, which are retained by IPP Three significant PS1 I/O threads: - PowerPoint PPT Presentation

Transcript of Pan-STARRS PS1 Published Science Products Subsystem

Page 1: Pan-STARRS PS1 Published Science Products Subsystem

Pan-STARRS PS1 Published Science Products Subsystem

Presentation to the PS1 Science Council August 1, 2007

Page 2: Pan-STARRS PS1 Published Science Products Subsystem

What is PSPS?

• Responsible for managing the catalogs of digital data

• PS1 PSPS will not receive image files, which are retained by IPP

• Three significant PS1 I/O threads:– Ingest of detections and initial

celestial object data from IPP– Ingest of moving object data from

MOPS– User queries of detection/object

data records

Page 3: Pan-STARRS PS1 Published Science Products Subsystem

What is PSPS?

• Web Based Interface (WBI) – the “link” with the human

• Data Retrieval Layer (DRL) – the “gate-keeper” of the data collections

• PS1 data collection managers– Object Data Manager (ODM)– Solar System Data Manager (SSDM)

• Provide the connection protocol for other (future/PS4) data collection managers; e.g.,– “Postage stamp” cutouts– Complete Metadata database – Cumulative sky image server– Filtered transient database (and other

special clients)

DRL

WBI Other S/W Client

Human

ODM SSDM Other DM

IPP MOPS

Page 4: Pan-STARRS PS1 Published Science Products Subsystem

PSPS ComponentsOverview/Terminology

• DRL: Data Retrieval Layer

– Software clients, not humans, are PDCs

– Connects to DMs• PDC: Published Data

Client– WBI: Web Based

Interface– External PDCs (non-

PSPS)• DM: Data Manager

(generic)– ODM: Object Data

Manager– SSDM: Solar System

Data Manager

WBI

PublishedData Client

DRL

Standard User APIAdministrator API

Data Manager API

science data

interfacecontract

interfacedependency

Legend

ODMSSDM

MOPS IPP

Pan-STARRSSubsystem

PSPSComponent

FutureComponent

DataManager

PSPS-IPP InterfacePSPS-MOPS Interface

PSPS

metadata, detections raw

science data

science data

IDs

PreferredScience Client(Data Provider)

FuturePan-STARRS

Subsystem

PublishedData Client

NonPan-STARRS

System

Data Manager API Data Manager API

Page 5: Pan-STARRS PS1 Published Science Products Subsystem

PSPS Development Status

• DRL - a Request For Proposals has been issued to software developers to code the DRL designed by SAIC. This layer includes APIs to connect to the web clients and the databases.

• ODM - a cooperative agreement is being developed with Johns Hopkins University’s Department of Physics & Astronomy to develop the ODM, leveraging their experience from the Sloan Digital Sky Survey database work.

• SSDM - will be a working clone of the MOPS science client database (and a hot spare for the MOPS system).

Page 6: Pan-STARRS PS1 Published Science Products Subsystem

PSPS Development Status

• WBI - Web clients to access the ODM and SSDM will include those already developed for the MOPS, the “Gator” clone developed at IfA, and a port of the SDSS “CasJobs” client. These will use the new DRL API being developed in the lead item above.

• End-to-end testing of the PSPS structure can be accomplished using the DRL, the ported MOPS web client, and a MOPS clone on the backend. This can be done while the ODM is still under development.

Page 7: Pan-STARRS PS1 Published Science Products Subsystem

The Object Data Manager

• The ODM is the major component of the PSPS, both in terms of size and complexity. It’s more than a simple archive.

• The ODM will hold & provide user access to:– Catalogs of all individual focal plane (P2) detections.– Catalogs of detections from all stacked images.– Catalogs of all derived objects.– Catalogs of high-significance detections in difference images

(when they become available).– “Blobs” of low-significance detections from difference images.– Sufficient metadata to allow the user to determine the provenance

of any observation.

Page 8: Pan-STARRS PS1 Published Science Products Subsystem

ODM - Not Your Traditional Astronomical Database!

• Unlike SDSS or 2MASS, we are not waiting until the project is over to generate the database, we’ll publish data as we go!

• Data releases? The concept doesn’t apply here! We will probably keep monthly snapshots of the object catalog as the project proceeds.

• Our logical data structure will allow the user to track how an object’s properties change as new (better) information is added over time. (It’s possible but not necessarily easy!)

Page 9: Pan-STARRS PS1 Published Science Products Subsystem

ODM Prototyping Goals

The prototyping effort now underway at JHU is intended to demonstrate:– Data ingest (primarily detection to object correlation)

– Scalability (physical data schema) - aka partitioning

– Publishing (moving data from ingest pipeline to query side storage) in a way that has minimal impact on queries

Page 10: Pan-STARRS PS1 Published Science Products Subsystem

Prototype ODM Structure

Legend

DatabaseFull table [partitioned table]Output tablePartitioned View

Query Manager (QM)Query Manager (QM)

PS1

P1 Pm

PartionsMap

Objects

LnkToObj

Meta

[Objects_p1]

[LnkToObj_p1]

[Detections_p1]

Meta

[Objects_pm]

[LnkToObj_pm]

[Detections_pm]

MetaDetections

Linked servers

Data Storage (DS)

Web Based Interface (WBI)Web Based Interface (WBI)

Data Transformation Layer (DX)Data Transformation Layer (DX)

LoadAdmin

LoadSupport1

objZoneIndx

orphans

Detections_l1

LnkToObj_l1

objZoneIndx

orphans

Detections_ln

LnkToObj_ln

LoadSupportn

Linked servers

PartitionMapData Loading Pipeline (DLP)

Legend

DatabaseFull table [partitioned table]Output tablePartitioned View

Query Manager (QM)Query Manager (QM)

PS1

P1 Pm

PartionsMap

Objects

LnkToObj

Meta

[Objects_p1]

[LnkToObj_p1]

[Detections_p1]

Meta

[Objects_pm]

[LnkToObj_pm]

[Detections_pm]

MetaDetections

Linked servers

Data Storage (DS)

PS1

P1 Pm

PartionsMap

Objects

LnkToObj

Meta

[Objects_p1]

[LnkToObj_p1]

[Detections_p1]

Meta

[Objects_pm]

[LnkToObj_pm]

[Detections_pm]

MetaDetections

Linked servers

Data Storage (DS)

Web Based Interface (WBI)Web Based Interface (WBI)

Data Transformation Layer (DX)Data Transformation Layer (DX)

LoadAdmin

LoadSupport1

objZoneIndx

orphans

Detections_l1

LnkToObj_l1

objZoneIndx

orphans

Detections_ln

LnkToObj_ln

LoadSupportn

Linked servers

PartitionMapData Loading Pipeline (DLP)

LoadAdmin

LoadSupport1

objZoneIndx

orphans

Detections_l1

LnkToObj_l1

objZoneIndx

orphans

Detections_ln

LnkToObj_ln

LoadSupportn

Linked servers

PartitionMapData Loading Pipeline (DLP)

Page 11: Pan-STARRS PS1 Published Science Products Subsystem

Existing Components (from SDSS)

The prototype will utilize the following existing SDSS components:– Data Loading Pipeline (sqlLoader)

– Self-extracting Documentation & Diagnostics

– SQL Query Workbench (CasJobs)

– Spatial Library (Spherical/HTM)

Page 12: Pan-STARRS PS1 Published Science Products Subsystem

Functionality Under Development

New components for the prototype include:– Data Transformation Layer (input to loader)

– Simulated Data (SDSS data & simulated galactic plane)

– Sample Queries (verify query performance)

– Cross-Match Functionality (detection-object correlation)

– Data Partitioning Procedures (partition across muti-mode cluster for parallel data access)

Page 13: Pan-STARRS PS1 Published Science Products Subsystem

PS1 Logical Data Schema

Page 14: Pan-STARRS PS1 Published Science Products Subsystem

PS1 Data Tables & Sizes

tablename cols byte/row rows total (TB) Prototype DR1 comments

AltModels 7 1547 10 1.547E-08 1.547E-08 0CameraConfig 5 287 30 8.61E-09 8.61E-09 0FileGroupMap 4 4335 100 4.335E-07 4.335E-07 0IndexMap 7 2301 100 2.301E-07 2.301E-07 0Objects 88 420 5.50E+09 2.31 0.693 2.31 5 billion stars + 500 million galaxies = total number of objectsObjZoneIndx 7 63 5.50E+09 0.3465 0.10395 0.35 for circular and especially neighbor queries [optional but good to have it at least in prototype]PartitionMap 3 4111 100 4.111E-07 4.111E-07 0PhotoCal 10 151 1000 1.51E-07 1.51E-07 0 Long-term stability of cameraPhotozRecipes 2 267 10 2.67E-09 2.67E-09 0 Descriptors of photo-z algorithmsSkyCells 2 10 50000 0.0000005 0.0000005 0 Definitions of regionsSurveys 2 267 30 8.01E-09 8.01E-09 0 Survey index and text descriptorDropP2ToObj 4 39 4.00E+06 0.000156 1.337E-05 0 Are thes tw o really the same?DropStackToObj 4 39 4.00E+06 0.000156 1.337E-05 0P2AltFits 13 71 1.51E+10 1.06855 0.09159 0.31 10% of P2 detections x 3.5 yearsP2FrameMeta 18 343 1.05E+06 0.00036015 3.087E-05 0P2ImageMeta 64 2870 6.72E+07 0.192864 0.0165312 0.06 1000 images/night x 64/frame x 300 nights x 3.5 yearsP2PsfFits 34 183 1.51E+11 27.5415 2.3607 7.87 total P2 dectections /yr * 3.5 yearsP2ToObj 3 31 1.51E+11 4.6655 0.3999 1.33 Linking table - same size as P2PsfFits detectionsP2ToStack 2 15 1.51E+11 2.2575 0.1935 0.65StackDeltaAltFits 13 71 3.68E+09 0.260925 0.022365 0.07 10% of StackHiSigDeltas - comets, trails etc.StackHiSigDeltas 32 167 3.68E+10 6.13725 0.52605 1.75 7 sq deg x 5000/image x 1000 images/night x 300 nights x 3.5 years (upper bound)StackLow SigDelta 2 5000 1.65E+06 0.00825 0.0007071 0 Numerical noise - varbinary (FITS table) so need to get average sizeStackMeta 49 1551 700000 0.0010857 0.0003257 0 30000 (for 3-pi survey) x 5 f ilters ( round to 200k) x 3.5 yearsStackModelFits 131 535 7.50E+09 4.0125 0.3439286 1.15 number of galaxies x 3 copies x 5 f iltersStackPsfFits 44 215 8.25E+10 17.7375 1.5203571 5.07 total objects x 5 f ilters x 3 copiesStackToObj 4 39 8.25E+10 3.2175 0.2757857 0.92 Linking table same size as StackP2FitsStationaryTransient 2 23 5.00E+08 0.0115 0.0009857 0 Linking table - assume 10% of stars are transients

sum 69.7695986 6.5497356 21.8 Total data sizeindices 13.9539197 1.3099471 4.37 Assume 20% overhead for database indicestotal 83.7235183 7.8596827 26.2 Total size of database

Page 15: Pan-STARRS PS1 Published Science Products Subsystem

User Interfaces

• The DRL authenticates “users” on a per machine basis.

• Our initial implementation will be via a secure web server providing access to the following clients:– A port of CasJobs from SDSS to access the ODM

– A “Gator” like menu driven tool to access the ODM

– Perl tools (developed by MOPS) to access the SSDM

• Machine access (PDCs) will be configured to attach to the DRL directly.

Page 16: Pan-STARRS PS1 Published Science Products Subsystem
Page 17: Pan-STARRS PS1 Published Science Products Subsystem
Page 18: Pan-STARRS PS1 Published Science Products Subsystem
Page 19: Pan-STARRS PS1 Published Science Products Subsystem
Page 20: Pan-STARRS PS1 Published Science Products Subsystem
Page 21: Pan-STARRS PS1 Published Science Products Subsystem
Page 22: Pan-STARRS PS1 Published Science Products Subsystem
Page 23: Pan-STARRS PS1 Published Science Products Subsystem
Page 24: Pan-STARRS PS1 Published Science Products Subsystem

System Expansion

• Addition of other data collections, e.g., value added products are accommodated within our PSPS design.– The basic PSPS design provides well-defined APIs to the outside

(WBI and PDCs) and inside (databases)– Data collections need not be Relational Database Management

Systems (RDMS), but must obey the DRL-DM API– Databases need not all be the same type, e.g., ODM will use

MSSQL and SSDM will be built on MySQL.• Although not part of the original PSPS design, we can

provide intra-database communications (below the DRL) via well-defined mechanisms (e.g., ODBC, JDBC) to allow queries that cross the data collections hosted by the PSPS. These would be limited to read operations.

Page 25: Pan-STARRS PS1 Published Science Products Subsystem

Development Schedule

• Award DRL development contract - August 2007

• ODM Prototyping through end of September 2007

• Critical Design Review - end of October 2007

• Hire PSPS Software Engineers (IfA & JHU) - October 2007

• Complete DRL development and perform integration & end-to-end tests using MOPS DB and web interface - April 2008

• Complete integration of the ODM from JHU into the PSPS & full subsystem testing of the system - August 2008