PS1 PSPS Object Data Manager Design

53
PS1 PSPS Object Data Manager Design PSPS Critical Design Review November 5-6, 2007 IfA

description

PS1 PSPS Object Data Manager Design. PSPS Critical Design Review November 5-6, 2007 IfA. Outline. ODM Overview Critical Requirements Driving Design Work Completed Detailed Design Spatial Querying [AS] ODM Prototype [MN] Hardware/Scalability [JV] How Design Meets Requirements - PowerPoint PPT Presentation

Transcript of PS1 PSPS Object Data Manager Design

Page 1: PS1 PSPS Object Data Manager Design

PS1 PSPSObject Data Manager Design

PSPS Critical Design Review November 5-6, 2007

IfA

Page 2: PS1 PSPS Object Data Manager Design

slide 2

Outline

ODM Overview Critical Requirements Driving Design Work Completed Detailed Design Spatial Querying [AS]

ODM Prototype [MN]

Hardware/Scalability [JV]

How Design Meets Requirements WBS and Schedule Issues/Risks

[AS] = Alex, [MN] = Maria, [JV] = Jan

Page 3: PS1 PSPS Object Data Manager Design

slide 3

ODM Overview

The Object Data Manager will:

Provide a scalable data archive for the Pan-STARRS data products

Provide query access to the data for Pan-STARRS users

Provide detailed usage tracking and logging

Page 4: PS1 PSPS Object Data Manager Design

slide 4

ODM Driving Requirements

Total size 100 TB, • 1.5 x 1011 P2 detections• 8.3x1010 P2 cumulative-sky (stack) detections• 5.5x109 celestial objects

Nominal daily rate (divide by 3.5x365)• P2 detections: 120 Million/day• Stack detections: 65 Million/day• Objects: 4.3 Million/day

Cross-Match requirement: 120 Million / 12 hrs ~ 2800 / s DB size requirement:

• 25 TB / yr• ~100 TB by of PS1 (3.5 yrs)

Page 5: PS1 PSPS Object Data Manager Design

slide 5

Work completed so far

Built a prototype Scoped and built prototype hardware Generated simulated data

• 300M SDSS DR5 objects, 1.5B Galactic plane objects

Initial Load done – Created 15 TB DB of simulated data• Largest astronomical DB in existence today

Partitioned the data correctly using Zones algorithm Able to run simple queries on distributed DB Demonstrated critical steps of incremental loading It is fast enough

• Cross-match > 60k detections/sec• Required rate is ~3k/sec

Page 6: PS1 PSPS Object Data Manager Design

slide 6

Detailed Design

Reuse SDSS software as much as possible Data Transformation Layer (DX) – Interface to IPP Data Loading Pipeline (DLP) Data Storage (DS)

• Schema and Test Queries• Database Management System• Scalable Data Architecture• Hardware

Query Manager (QM: CasJobs for prototype)

Page 7: PS1 PSPS Object Data Manager Design

slide 7

High-Level Organization

Legend

DatabaseFull table [partitioned table]Output tablePartitioned View

Query Manager (QM)Query Manager (QM)

PS1

P1 Pm

PartionsMap

Objects

LnkToObj

Meta

[Objects_p1]

[LnkToObj_p1]

[Detections_p1]

Meta

[Objects_pm]

[LnkToObj_pm]

[Detections_pm]

MetaDetections

Linked servers

Data Storage (DS)

Web Based Interface (WBI)Web Based Interface (WBI)

Data Transformation Layer (DX)Data Transformation Layer (DX)

LoadAdmin

LoadSupport1

objZoneIndx

orphans

Detections_l1

LnkToObj_l1

objZoneIndx

orphans

Detections_ln

LnkToObj_ln

LoadSupportn

Linked servers

PartitionMapData Loading Pipeline (DLP)

Legend

DatabaseFull table [partitioned table]Output tablePartitioned View

Query Manager (QM)Query Manager (QM)

PS1

P1 Pm

PartionsMap

Objects

LnkToObj

Meta

[Objects_p1]

[LnkToObj_p1]

[Detections_p1]

Meta

[Objects_pm]

[LnkToObj_pm]

[Detections_pm]

MetaDetections

Linked servers

Data Storage (DS)

PS1

P1 Pm

PartionsMap

Objects

LnkToObj

Meta

[Objects_p1]

[LnkToObj_p1]

[Detections_p1]

Meta

[Objects_pm]

[LnkToObj_pm]

[Detections_pm]

MetaDetections

Linked servers

Data Storage (DS)

Web Based Interface (WBI)Web Based Interface (WBI)

Data Transformation Layer (DX)Data Transformation Layer (DX)

LoadAdmin

LoadSupport1

objZoneIndx

orphans

Detections_l1

LnkToObj_l1

objZoneIndx

orphans

Detections_ln

LnkToObj_ln

LoadSupportn

Linked servers

PartitionMapData Loading Pipeline (DLP)

LoadAdmin

LoadSupport1

objZoneIndx

orphans

Detections_l1

LnkToObj_l1

objZoneIndx

orphans

Detections_ln

LnkToObj_ln

LoadSupportn

Linked servers

PartitionMapData Loading Pipeline (DLP)

Page 8: PS1 PSPS Object Data Manager Design

slide 8

Detailed Design

Reuse SDSS software as much as possible Data Transformation Layer (DX) – Interface to IPP Data Loading Pipeline (DLP) Data Storage (DS)

• Schema and Test Queries• Database Management System• Scalable Data Architecture• Hardware

Query Manager (QM: CasJobs for prototype)

Page 9: PS1 PSPS Object Data Manager Design

slide 9

Data Transformation Layer (DX)

Based on SDSS sqlFits2CSV package• LINUX/C++ application• FITS reader driven off header files

Convert IPP FITS files to• ASCII CSV format for ingest (initially)• SQL Server native binary later (3x faster)

Follow the batch and ingest verification procedure described in ICD• 4-step batch verification• Notification and handling of broken publication cycle

Deposit CSV or Binary input files in directory structure• Create “ready” file in each batch directory

Stage input data on LINUX side as it comes in from IPP

Page 10: PS1 PSPS Object Data Manager Design

slide 10

DX Subtasks

DXDX

Initialization Job

FITS schemaFITS reader

CSV ConverterCSV Writer

Initialization Job

FITS schemaFITS reader

CSV ConverterCSV Writer

Batch Ingest

Interface with IPPNaming conventionUncompress batch

Read batchVerify Batch

Batch Ingest

Interface with IPPNaming conventionUncompress batch

Read batchVerify Batch

BatchVerification

Verify ManifestVerify FITS IntegrityVerify FITS Content

Verify FITS DataHandle Broken Cycle

BatchVerification

Verify ManifestVerify FITS IntegrityVerify FITS Content

Verify FITS DataHandle Broken Cycle

BatchConversion

CSV ConverterBinary Converter

“batch_ready”Interface with DLP

BatchConversion

CSV ConverterBinary Converter

“batch_ready”Interface with DLP

Page 11: PS1 PSPS Object Data Manager Design

slide 11

DX-DLP Interface

Directory structure on staging FS (LINUX):• Separate directory for each JobID_BatchID• Contains a “batch_ready” manifest file

– Name, #rows and destination table of each file• Contains one file per destination table in ODM

– Objects, Detections, other tables Creation of “batch_ready” file is signal to loader to ingest

the batch Batch size and frequency of ingest cycle TBD

Page 12: PS1 PSPS Object Data Manager Design

slide 12

Detailed Design

Reuse SDSS software as much as possible Data Transformation Layer (DX) – Interface to IPP Data Loading Pipeline (DLP) Data Storage (DS)

• Schema and Test Queries• Database Management System• Scalable Data Architecture• Hardware

Query Manager (QM: CasJobs for prototype)

Page 13: PS1 PSPS Object Data Manager Design

slide 13

Data Loading Pipeline (DLP)

sqlLoader – SDSS data loading pipeline• Pseudo-automated workflow system• Loads, validates and publishes data

– From CSV to SQL tables• Maintains a log of every step of loading• Managed from Load Monitor Web interface

Has been used to load every SDSS data release• EDR, DR1-6, ~ 15 TB of data altogether• Most of it (since DR2) loaded incrementally• Kept many data errors from getting into database

– Duplicate ObjIDs (symptom of other problems)– Data corruption (CSV format invaluable in

catching this)

Page 14: PS1 PSPS Object Data Manager Design

slide 14

sqlLoader Design

Existing functionality• Shown for SDSS version• Workflow, distributed loading, Load Monitor

New functionality• Schema changes• Workflow changes• Incremental loading

– Cross-match and partitioning

Page 15: PS1 PSPS Object Data Manager Design

slide 15

sqlLoader Workflow

Distributed design achieved with linked servers and SQL Server Agent

LOAD stage can be done in parallel by loading into temporary task databases

PUBLISH stage writes from task DBs to final DB

FINISH stage creates indices and auxiliary (derived) tables

LOADLOAD

PUBLISHPUBLISHFINISHFINISH

EXPEXP

CHKCHK

BLDBLD

SQLSQL

VALVAL

BCKBCK

DTCDTC

Export

Check CSV

Build Task DBs

Build SQL Schema

Validate

Backup

Detach

PUBPUB

CLNCLN

Publish

Cleanup

FINFIN

LOADLOAD

PUBLISHPUBLISHFINISHFINISH

EXPEXP

CHKCHK

BLDBLD

SQLSQL

VALVAL

BCKBCK

DTCDTC

Export

Check CSV

Build Task DBs

Build SQL Schema

Validate

Backup

Detach

PUBPUB

CLNCLN

Publish

Cleanup

FINFIN

LOADLOAD

PUBLISHPUBLISHFINISHFINISH

EXPEXP

CHKCHK

BLDBLD

SQLSQL

VALVAL

BCKBCK

DTCDTC

Export

Check CSV

Build Task DBs

Build SQL Schema

Validate

Backup

Detach

PUBPUB

CLNCLN

Publish

Cleanup

FINFIN

LOADLOAD

PUBLISHPUBLISHFINISHFINISH

EXPEXP

CHKCHK

BLDBLD

SQLSQL

VALVAL

BCKBCK

DTCDTC

Export

Check CSV

Build Task DBs

Build SQL Schema

Validate

Backup

Detach

PUBPUB

CLNCLN

Publish

Cleanup

FINFIN

Loading pipeline is a system of VB and SQL scripts, stored procedures and functions

Page 16: PS1 PSPS Object Data Manager Design

slide 16

Load Monitor Tasks Page

Page 17: PS1 PSPS Object Data Manager Design

slide 17

Load Monitor Active Tasks

Page 18: PS1 PSPS Object Data Manager Design

slide 18

Load Monitor Statistics Page

Page 19: PS1 PSPS Object Data Manager Design

slide 19

Load Monitor – New Task(s)

Page 20: PS1 PSPS Object Data Manager Design

slide 20

Test UniquenessOf Primary KeysTest UniquenessOf Primary Keys

TestForeign Keys

TestForeign Keys

TestCardinalities

TestCardinalities

TestHTM IDs

TestHTM IDs

Test Link TableConsistency

Test Link TableConsistency

Test the uniqueKey in each table

Test for consistencyof keys that link tables

Test consistency of numbers of various quantities

Test the HierarchicalTriamgular Mesh IDsused for spatial indexing

Ensure that links areconsistent

Data Validation

Tests for data integrity and consistency

Scrubs data and finds problems in upstream pipelines

Most of the validation can be performed within the individual task DB (in parallel)

Page 21: PS1 PSPS Object Data Manager Design

slide 21

Master Master

SlaveSlave SlaveSlave

Samba-mounted CSV/Binary FilesSamba-mounted CSV/Binary Files

PublishData

PublishData

FinishFinish

Task DB Task DBTaskDataTaskData

Task DB

Task DBView of

MasterSchema

TaskDataTaskData

LoadSupportLoadSupport Task DB

Task DB

TaskDataTaskData

Load Monitor

PublishSchema

View ofMaster

Schema

View ofMaster

Schema

MasterSchema

LoadAdminLoadAdmin

Distributed Loading

Publish

LoadSupportLoadSupportLoadSupportLoadSupport

Page 22: PS1 PSPS Object Data Manager Design

slide 22

Schema Changes

Schema in task and publish DBs is driven off a list of schema DDL files to execute (xschema.txt)

Requires replacing DDL files in schema/sql directory and updating xschema.txt with their names

PS1 schema DDL files have already been built Index definitions have also been created Metadata tables will be automatically generated using

metadata scripts already in the loader

Page 23: PS1 PSPS Object Data Manager Design

slide 23

LOADExportExport

CheckCSVs

CheckCSVs

CreateTask DBsCreate

Task DBs

Build SQLSchema

Build SQLSchema

ValidateValidate

XMatchXMatch

Workflow Changes

Cross-Match and Partition steps will be added to the workflow

Cross-match will match detections to objects

Partition will horizontally partition data, move it to slice servers, and build DPVs on main

PUBLISH

PartitionPartition

Page 24: PS1 PSPS Object Data Manager Design

slide 24

Matching Detections with Objects

Algorithm described fully in prototype section Stored procedures to cross-match detections will be part

of the LOAD stage in loader pipeline Vertical partition of Objects table kept on load server for

matching with detections Zones cross-match algorithm used to do 1” and 2”

matches Detections with no matches saved in Orphans table

Page 25: PS1 PSPS Object Data Manager Design

slide 25

XMatch and Partition Data Flow

Loadsupport

PS1

Pm

Detections

LoadDetections

XMatchDetections_In

PullChunk

LinkToObj_In

ObjZoneIndx

Orphans

Detections_chunk

LinkToObj_chunk

MergePartitions

Detections_m

LinkToObj_m

UpdateObjects

Objects_mPull

PartitionSwitch

Partition

Objects_m

LinkToObj_m

Objects

LinkToObj

Page 26: PS1 PSPS Object Data Manager Design

slide 26

Detailed Design

Reuse SDSS software as much as possible Data Transformation Layer (DX) – Interface to IPP Data Loading Pipeline (DLP) Data Storage (DS)

• Schema and Test Queries• Database Management System• Scalable Data Architecture• Hardware

Query Manager (QM: CasJobs for prototype)

Page 27: PS1 PSPS Object Data Manager Design

slide 27

Data Storage – Schema

Page 28: PS1 PSPS Object Data Manager Design

slide 28

PS1 Table Sizes Spreadsheet

Stars 5.00E+09 1.51E+11Galaxies 5.00E+08 36750000000Total Objects 5.50E+09 m

2.3E-07 0.3*DR1 3.00P2 Detections per year 4.30E+10 0.3 0.29 0.57 0.86 1.00

tablename columns bytes/row total rows total size (TB) Prototype DR1 DR2 DR3 DR4

AltModels 0 7 1547 10 1.547E-08 1.547E-08 1.547E-08 1.547E-08 1.547E-08 1.547E-08 1 1CameraConfig 0 5 287 30 8.61E-09 8.61E-09 8.61E-09 8.61E-09 8.61E-09 8.61E-09 1 1FileGroupMap 0 4 4335 100 4.335E-07 4.335E-07 4.335E-07 4.335E-07 4.335E-07 4.335E-07 1 1IndexMap 0 7 2301 100 2.301E-07 2.301E-07 2.301E-07 2.301E-07 2.301E-07 2.301E-07 1 1Objects 0 88 420 5.50E+09 2.31 0.693 2.31 2.31 2.31 2.31 1 0.33ObjZoneIndx 0 7 63 5.50E+09 0.3465 0.10395 0.3465 0.3465 0.3465 0.3465 1 0PartitionMap 0 3 4111 100 4.111E-07 4.111E-07 4.111E-07 4.111E-07 4.111E-07 4.111E-07 1 1PhotoCal 0 10 151 1000 0.000000151 0.000000151 0.000000151 0.000000151 0.000000151 0.000000151 1 1PhotozRecipes 0 2 267 10 2.67E-09 2.67E-09 2.67E-09 2.67E-09 2.67E-09 2.67E-09 1 1SkyCells 0 2 10 50000 0.0000005 0.0000005 0.0000005 0.0000005 0.0000005 0.0000005 1 1Surveys 0 2 267 30 8.01E-09 8.01E-09 8.01E-09 8.01E-09 8.01E-09 8.01E-09 1 1DropP2ToObj 1 4 39 4.00E+06 0.000156 1.33714E-05 4.45714E-05 8.91429E-05 0.000133714 0.000156 1 0.33DropStackToObj 1 4 39 4.00E+06 0.000156 1.33714E-05 4.45714E-05 8.91429E-05 0.000133714 0.000156 1 0.33P2AltFits 1 13 71 1.51E+10 1.06855 0.09159 0.3053 0.6106 0.9159 1.06855 0 0.33P2FrameMeta 1 18 343 1.05E+06 0.00036015 0.00003087 0.0001029 0.0002058 0.0003087 0.00036015 1 1P2ImageMeta 1 64 2870 6.72E+07 0.192864 0.0165312 0.055104 0.110208 0.165312 0.192864 1 1P2PsfFits 1 34 183 1.51E+11 27.5415 2.3607 7.869 15.738 23.607 27.5415 0 0.33P2ToObj 1 3 31 1.51E+11 4.6655 0.3999 1.333 2.666 3.999 4.6655 1 0.33P2ToStack 1 2 15 1.51E+11 2.2575 0.1935 0.645 1.29 1.935 2.2575 0 0.33StackDeltaAltFits 1 13 71 3.68E+09 0.260925 0.022365 0.07455 0.1491 0.22365 0.260925 0 0.33StackHiSigDeltas 1 32 167 3.68E+10 6.13725 0.52605 1.7535 3.507 5.2605 6.13725 0 0.33StackLowSigDelta 1 2 5000 1.65E+06 0.00825 0.000707143 0.002357143 0.004714286 0.007071429 0.00825 0 0.33StackMeta 1 49 1551 700000 0.0010857 0.00032571 0.0010857 0.0010857 0.0010857 0.0010857 0 0.33StackModelFits 1 131 535 7.50E+09 4.0125 0.343928571 1.146428571 2.292857143 3.439285714 4.0125 0 0.33StackPsfFits 1 44 215 8.25E+10 17.7375 1.520357143 5.067857143 10.13571429 15.20357143 17.7375 0 0.33StackToObj 1 4 39 8.25E+10 3.2175 0.275785714 0.919285714 1.838571429 2.757857143 3.2175 1 0.33StationaryTransient 1 2 23 5.00E+08 0.0115 0.000985714 0.003285714 0.006571429 0.009857143 0.0115 1 0.33

sum 69.76959861 6.549735569 21.83244779 41.00730812 60.18216845 69.76959861indices 13.95391972 1.309947114 4.366489558 8.201461624 12.03643369 13.95391972total 83.72351833 7.859682683 26.19893735 49.20876974 72.21860214 83.72351833

0 means the table size is essentially the same for all data releases Primary filegroup1 means the table size will grow

0 means full table1 means the table is partitioned and distributed across the cluster

Fraction of the table contained on each partition

Note: These estimates are for the whole PS1, assuming 3.5 years. 7 bytes added to each row for overhead as suggested by Alex

Page 29: PS1 PSPS Object Data Manager Design

slide 29

PS1 Table Sizes - All Servers

Table Year 1 Year 2 Year 3 Year 3.5

Objects 4.63 4.63 4.61 4.59

StackPsfFits 5.08 10.16 15.20 17.76

StackToObj 1.84 3.68 5.56 6.46

StackModelFits 1.16 2.32 3.40 3.96

P2PsfFits 7.88 15.76 23.60 27.60

P2ToObj 2.65 5.31 8.00 9.35

Other Tables 3.41 6.94 10.52 12.67

Indexes +20% 5.33 9.76 14.18 16.48

Total 31.98 58.56 85.07 98.87

Sizes are in TB

Page 30: PS1 PSPS Object Data Manager Design

slide 30

Data Storage – Test Queries

Drawn from several sources• Initial set of SDSS 20 queries• SDSS SkyServer Sample Queries• Queries from PS scientists (Monet, Howell, Kaiser,

Heasley) Two objectives

• Find potential holes/issues in schema• Serve as test queries

– Test DBMS iintegrity– Test DBMS performance

Loaded into CasJobs (Query Manager) as sample queries for prototype

Page 31: PS1 PSPS Object Data Manager Design

slide 31

Data Storage – DBMS

Microsoft SQL Server 2005• Relational DBMS with excellent query optimizer

Plus• Spherical/HTM (C# library + SQL glue)

– Spatial index (Hierarchical Triangular Mesh)• Zones (SQL library)

– Alternate spatial decomposition with dec zones• Many stored procedures and functions

– From coordinate conversions to neighbor search functions

• Self-extracting documentation (metadata) and diagnostics

Page 32: PS1 PSPS Object Data Manager Design

slide 32

Documentation and Diagnostics

Page 33: PS1 PSPS Object Data Manager Design

slide 33

Data Storage – Scalable Architecture

Monolithic database design (a la SDSS) will not do it SQL Server does not have cluster implementation

• Do it by hand Partitions vs Slices

• Partitions are file-groups on the same server– Parallelize disk accesses on the same machine

• Slices are data partitions on separate servers• We use both!

Additional slices can be added for scale-out For PS1, use SQL Server Distributed Partition Views

(DPVs)

Page 34: PS1 PSPS Object Data Manager Design

slide 34

Distributed Partitioned Views

Difference between DPVs and file-group partitioning• FG on same database• DPVs on separate DBs• FGs are for scale-up• DPVs are for scale-out

Main server has a view of a partitioned table that includes remote partitions (we call them slices to distinguish them from FG partitions)

Accomplished with SQL Server’s linked server technology

NOT truly parallel, though

Page 35: PS1 PSPS Object Data Manager Design

slide 35

Scalable Data Architecture

Shared-nothing architecture Detections split across cluster Objects

replicated on Head and Slice DBs

DPVs of Detections tables on the Headnode DB

Queries on Objects stay on head node

S2

S3

Head

S1

Objects_S1

Objects_S2

Objects_S3

Objects_S1

Objects_S2

Objects_S3

Detections_S1

Detections_S2

Detections_S3

Objects

Detections_S1

Detections_S2

Detections_S3

Detections DPV

Queries on detections use only local data on slices

Page 36: PS1 PSPS Object Data Manager Design

slide 36

Hardware - Prototype

LXPS01

L1PS13

L2/MPS05

Staging Loading

10 TB 9 TB

8

4

4HeadPS11

8

DB

S1PS12

8

S2PS03

4

S3PS04

4

WPS02

4

MyDB

39 TB

2A

2A

2A

2A

A

A2B B

RAID5 RAID10 RAID10 RAID10

14D/3.5W 12D/4W

Total space

RAID config

Disk/rack config

Function

10A = 10 x [13 x 750 GB]3B = 3 x [12 x 500 GB]

LX = LinuxL = Load serverS/Head = DB serverM = MyDB serverW = Web server

Web

0 TB

PS0x = 4-corePS1x = 8-core

Server NamingConvention:

Storage:

Function:

Page 37: PS1 PSPS Object Data Manager Design

slide 37

Hardware – PS1

Offline(Copy 2)

Spare(Copy 3)

Live(Copy 1)

Offline(Copy 2)

Spare(Copy 3)

Live(Copy 1)

Queries Ingest

Offline(Copy 1)

Spare(Copy 3)

Live(Copy 2)

Live(Copy 2)

Spare(Copy 3)

Live(Copy 1)

ReplicateQueries

Queries

Queries

Replicate

Queries

Ping-pong configuration to maintain high availability and query performance

2 copies of each slice and of main (head) node database on fast hardware (hot spares)

3rd spare copy on slow hardware (can be just disk)

Updates/ingest on offline copy then switch copies when ingest and replication finished

Synchronize second copy while first copy is online

Both copies live when no ingest

3x basic config. for PS1

Page 38: PS1 PSPS Object Data Manager Design

slide 38

Detailed Design

Reuse SDSS software as much as possible Data Transformation Layer (DX) – Interface to IPP Data Loading Pipeline (DLP) Data Storage (DS)

• Schema and Test Queries• Database Management System• Scalable Data Architecture• Hardware

Query Manager (QM: CasJobs for prototype)

Page 39: PS1 PSPS Object Data Manager Design

slide 39

Query Manager

Based on SDSS CasJobs Configure to work with distributed database, DPVs Direct links (contexts) to slices can be added later if

necessary Segregates quick queries from long ones Saves query results server-side in MyDB Gives users a powerful query workbench Can be scaled out to meet any query load PS1 Sample Queries available to users PS1 Prototype QM demo

Page 40: PS1 PSPS Object Data Manager Design

slide 40

ODM Prototype Components

Data Loading Pipeline Data Storage CasJobs

• Query Manager (QM)• Web Based Interface (WBI)

Testing

Page 41: PS1 PSPS Object Data Manager Design

slide 41

Spatial Queries (Alex)

Page 42: PS1 PSPS Object Data Manager Design

slide 42

Prototype (Maria)

Page 43: PS1 PSPS Object Data Manager Design

slide 43

Hardware/Scalability (Jan)

Page 44: PS1 PSPS Object Data Manager Design

slide 44

How Design Meets Requirements

Cross-matching detections with objects• Zone cross-match part of loading pipeline• Already exceeded requirement with prototype

Query performance• Ping-pong configuration for query during ingest• Spatial indexing and distributed queries• Query manager can be scaled out as necessary

Scalability• Shared-nothing architecture• Scale out as needed• Beyond PS1 we will need truly parallel query plans

Page 45: PS1 PSPS Object Data Manager Design

slide 45

WBS/Development Tasks

Refine Prototype/Schema

Staging/Transformation

Initial Load

Load/Resolve Detections

Resolve/Synchronize Objects

Create Snapshot

Replication Module

Query Processing

• Workflow Systems• Logging• Data Scrubbing• SSIS (?) + C#

• QM/LoggingHardware

Documentation

2 PM

3 PM

1 PM

3 PM

3 PM

1 PM

2 PM

2 PM

2 PM

2 PM

4 PM

4 PM

4 PM

2 PM

Total Effort: 35 PMDelivery: 9/2008

Testing

Redistribute Data

Page 46: PS1 PSPS Object Data Manager Design

slide 46

Personnel Available

2 new hires (SW Engineers) 100% Maria 80% Ani 20% Jan 10% Alainna 15% Nolan Li 25% Sam Carliles 25% George Fekete 5% Laszlo Dobos 50% (for 6 months)

Page 47: PS1 PSPS Object Data Manager Design

slide 47

Issues/Risks

Versioning• Do we need to preserve snapshots of monthly

versions?• How will users reproduce queries on subsequent

versions?• Is it ok that a new version of the sky replaces the

previous one every month? Backup/recovery

• Will we need 3 local copies rather than 2 for safety• Is restoring from offsite copy feasible?

Handoff to IfA beyond scope of WBS shown• This will involve several PMs

Page 48: PS1 PSPS Object Data Manager Design

Mahalo!

Page 49: PS1 PSPS Object Data Manager Design

slide 49

Context that query

is executed in

MyDB table that query results go

into

Name that this query

job is given

Check query syntax

Get graphical query plan

Run query in quick (1

minute) mode

Submit query to long (8-

hour) queue

Query buffer

Load one of the sample queries into

query buffer

Query Manager

Page 50: PS1 PSPS Object Data Manager Design

slide 50

Stored procedure arguments

SQL code for stored procedure

Query Manager

Page 51: PS1 PSPS Object Data Manager Design

slide 51

MyDB context is the default, but other contexts can be selected

The space used and total space available

Multiple tables can be selected and dropped at once

Table list can be sorted by name, size, type.

User can browse DB Views, Tables, Functions and

Procedures

Query Manager

Page 52: PS1 PSPS Object Data Manager Design

slide 52

The query that created this

table

Query Manager

Page 53: PS1 PSPS Object Data Manager Design

slide 53

Search radius

Table to hold results

Context to run search on

Query Manager