PS1 PSPS Object Data Manager Design
description
Transcript of PS1 PSPS Object Data Manager Design
PS1 PSPSObject Data Manager Design
PSPS Critical Design Review November 5-6, 2007
IfA
slide 2
Outline
ODM Overview Critical Requirements Driving Design Work Completed Detailed Design Spatial Querying [AS]
ODM Prototype [MN]
Hardware/Scalability [JV]
How Design Meets Requirements WBS and Schedule Issues/Risks
[AS] = Alex, [MN] = Maria, [JV] = Jan
slide 3
ODM Overview
The Object Data Manager will:
Provide a scalable data archive for the Pan-STARRS data products
Provide query access to the data for Pan-STARRS users
Provide detailed usage tracking and logging
slide 4
ODM Driving Requirements
Total size 100 TB, • 1.5 x 1011 P2 detections• 8.3x1010 P2 cumulative-sky (stack) detections• 5.5x109 celestial objects
Nominal daily rate (divide by 3.5x365)• P2 detections: 120 Million/day• Stack detections: 65 Million/day• Objects: 4.3 Million/day
Cross-Match requirement: 120 Million / 12 hrs ~ 2800 / s DB size requirement:
• 25 TB / yr• ~100 TB by of PS1 (3.5 yrs)
slide 5
Work completed so far
Built a prototype Scoped and built prototype hardware Generated simulated data
• 300M SDSS DR5 objects, 1.5B Galactic plane objects
Initial Load done – Created 15 TB DB of simulated data• Largest astronomical DB in existence today
Partitioned the data correctly using Zones algorithm Able to run simple queries on distributed DB Demonstrated critical steps of incremental loading It is fast enough
• Cross-match > 60k detections/sec• Required rate is ~3k/sec
slide 6
Detailed Design
Reuse SDSS software as much as possible Data Transformation Layer (DX) – Interface to IPP Data Loading Pipeline (DLP) Data Storage (DS)
• Schema and Test Queries• Database Management System• Scalable Data Architecture• Hardware
Query Manager (QM: CasJobs for prototype)
slide 7
High-Level Organization
Legend
DatabaseFull table [partitioned table]Output tablePartitioned View
Query Manager (QM)Query Manager (QM)
PS1
P1 Pm
PartionsMap
Objects
LnkToObj
Meta
[Objects_p1]
[LnkToObj_p1]
[Detections_p1]
Meta
[Objects_pm]
[LnkToObj_pm]
[Detections_pm]
MetaDetections
Linked servers
Data Storage (DS)
Web Based Interface (WBI)Web Based Interface (WBI)
Data Transformation Layer (DX)Data Transformation Layer (DX)
LoadAdmin
LoadSupport1
objZoneIndx
orphans
Detections_l1
LnkToObj_l1
objZoneIndx
orphans
Detections_ln
LnkToObj_ln
LoadSupportn
Linked servers
PartitionMapData Loading Pipeline (DLP)
Legend
DatabaseFull table [partitioned table]Output tablePartitioned View
Query Manager (QM)Query Manager (QM)
PS1
P1 Pm
PartionsMap
Objects
LnkToObj
Meta
[Objects_p1]
[LnkToObj_p1]
[Detections_p1]
Meta
[Objects_pm]
[LnkToObj_pm]
[Detections_pm]
MetaDetections
Linked servers
Data Storage (DS)
PS1
P1 Pm
PartionsMap
Objects
LnkToObj
Meta
[Objects_p1]
[LnkToObj_p1]
[Detections_p1]
Meta
[Objects_pm]
[LnkToObj_pm]
[Detections_pm]
MetaDetections
Linked servers
Data Storage (DS)
Web Based Interface (WBI)Web Based Interface (WBI)
Data Transformation Layer (DX)Data Transformation Layer (DX)
LoadAdmin
LoadSupport1
objZoneIndx
orphans
Detections_l1
LnkToObj_l1
objZoneIndx
orphans
Detections_ln
LnkToObj_ln
LoadSupportn
Linked servers
PartitionMapData Loading Pipeline (DLP)
LoadAdmin
LoadSupport1
objZoneIndx
orphans
Detections_l1
LnkToObj_l1
objZoneIndx
orphans
Detections_ln
LnkToObj_ln
LoadSupportn
Linked servers
PartitionMapData Loading Pipeline (DLP)
slide 8
Detailed Design
Reuse SDSS software as much as possible Data Transformation Layer (DX) – Interface to IPP Data Loading Pipeline (DLP) Data Storage (DS)
• Schema and Test Queries• Database Management System• Scalable Data Architecture• Hardware
Query Manager (QM: CasJobs for prototype)
slide 9
Data Transformation Layer (DX)
Based on SDSS sqlFits2CSV package• LINUX/C++ application• FITS reader driven off header files
Convert IPP FITS files to• ASCII CSV format for ingest (initially)• SQL Server native binary later (3x faster)
Follow the batch and ingest verification procedure described in ICD• 4-step batch verification• Notification and handling of broken publication cycle
Deposit CSV or Binary input files in directory structure• Create “ready” file in each batch directory
Stage input data on LINUX side as it comes in from IPP
slide 10
DX Subtasks
DXDX
Initialization Job
FITS schemaFITS reader
CSV ConverterCSV Writer
Initialization Job
FITS schemaFITS reader
CSV ConverterCSV Writer
Batch Ingest
Interface with IPPNaming conventionUncompress batch
Read batchVerify Batch
Batch Ingest
Interface with IPPNaming conventionUncompress batch
Read batchVerify Batch
BatchVerification
Verify ManifestVerify FITS IntegrityVerify FITS Content
Verify FITS DataHandle Broken Cycle
BatchVerification
Verify ManifestVerify FITS IntegrityVerify FITS Content
Verify FITS DataHandle Broken Cycle
BatchConversion
CSV ConverterBinary Converter
“batch_ready”Interface with DLP
BatchConversion
CSV ConverterBinary Converter
“batch_ready”Interface with DLP
slide 11
DX-DLP Interface
Directory structure on staging FS (LINUX):• Separate directory for each JobID_BatchID• Contains a “batch_ready” manifest file
– Name, #rows and destination table of each file• Contains one file per destination table in ODM
– Objects, Detections, other tables Creation of “batch_ready” file is signal to loader to ingest
the batch Batch size and frequency of ingest cycle TBD
slide 12
Detailed Design
Reuse SDSS software as much as possible Data Transformation Layer (DX) – Interface to IPP Data Loading Pipeline (DLP) Data Storage (DS)
• Schema and Test Queries• Database Management System• Scalable Data Architecture• Hardware
Query Manager (QM: CasJobs for prototype)
slide 13
Data Loading Pipeline (DLP)
sqlLoader – SDSS data loading pipeline• Pseudo-automated workflow system• Loads, validates and publishes data
– From CSV to SQL tables• Maintains a log of every step of loading• Managed from Load Monitor Web interface
Has been used to load every SDSS data release• EDR, DR1-6, ~ 15 TB of data altogether• Most of it (since DR2) loaded incrementally• Kept many data errors from getting into database
– Duplicate ObjIDs (symptom of other problems)– Data corruption (CSV format invaluable in
catching this)
slide 14
sqlLoader Design
Existing functionality• Shown for SDSS version• Workflow, distributed loading, Load Monitor
New functionality• Schema changes• Workflow changes• Incremental loading
– Cross-match and partitioning
slide 15
sqlLoader Workflow
Distributed design achieved with linked servers and SQL Server Agent
LOAD stage can be done in parallel by loading into temporary task databases
PUBLISH stage writes from task DBs to final DB
FINISH stage creates indices and auxiliary (derived) tables
LOADLOAD
PUBLISHPUBLISHFINISHFINISH
EXPEXP
CHKCHK
BLDBLD
SQLSQL
VALVAL
BCKBCK
DTCDTC
Export
Check CSV
Build Task DBs
Build SQL Schema
Validate
Backup
Detach
PUBPUB
CLNCLN
Publish
Cleanup
FINFIN
LOADLOAD
PUBLISHPUBLISHFINISHFINISH
EXPEXP
CHKCHK
BLDBLD
SQLSQL
VALVAL
BCKBCK
DTCDTC
Export
Check CSV
Build Task DBs
Build SQL Schema
Validate
Backup
Detach
PUBPUB
CLNCLN
Publish
Cleanup
FINFIN
LOADLOAD
PUBLISHPUBLISHFINISHFINISH
EXPEXP
CHKCHK
BLDBLD
SQLSQL
VALVAL
BCKBCK
DTCDTC
Export
Check CSV
Build Task DBs
Build SQL Schema
Validate
Backup
Detach
PUBPUB
CLNCLN
Publish
Cleanup
FINFIN
LOADLOAD
PUBLISHPUBLISHFINISHFINISH
EXPEXP
CHKCHK
BLDBLD
SQLSQL
VALVAL
BCKBCK
DTCDTC
Export
Check CSV
Build Task DBs
Build SQL Schema
Validate
Backup
Detach
PUBPUB
CLNCLN
Publish
Cleanup
FINFIN
Loading pipeline is a system of VB and SQL scripts, stored procedures and functions
slide 16
Load Monitor Tasks Page
slide 17
Load Monitor Active Tasks
slide 18
Load Monitor Statistics Page
slide 19
Load Monitor – New Task(s)
slide 20
Test UniquenessOf Primary KeysTest UniquenessOf Primary Keys
TestForeign Keys
TestForeign Keys
TestCardinalities
TestCardinalities
TestHTM IDs
TestHTM IDs
Test Link TableConsistency
Test Link TableConsistency
Test the uniqueKey in each table
Test for consistencyof keys that link tables
Test consistency of numbers of various quantities
Test the HierarchicalTriamgular Mesh IDsused for spatial indexing
Ensure that links areconsistent
Data Validation
Tests for data integrity and consistency
Scrubs data and finds problems in upstream pipelines
Most of the validation can be performed within the individual task DB (in parallel)
slide 21
Master Master
SlaveSlave SlaveSlave
Samba-mounted CSV/Binary FilesSamba-mounted CSV/Binary Files
PublishData
PublishData
FinishFinish
Task DB Task DBTaskDataTaskData
Task DB
Task DBView of
MasterSchema
TaskDataTaskData
LoadSupportLoadSupport Task DB
Task DB
TaskDataTaskData
Load Monitor
PublishSchema
View ofMaster
Schema
View ofMaster
Schema
MasterSchema
LoadAdminLoadAdmin
Distributed Loading
Publish
LoadSupportLoadSupportLoadSupportLoadSupport
slide 22
Schema Changes
Schema in task and publish DBs is driven off a list of schema DDL files to execute (xschema.txt)
Requires replacing DDL files in schema/sql directory and updating xschema.txt with their names
PS1 schema DDL files have already been built Index definitions have also been created Metadata tables will be automatically generated using
metadata scripts already in the loader
slide 23
LOADExportExport
CheckCSVs
CheckCSVs
CreateTask DBsCreate
Task DBs
Build SQLSchema
Build SQLSchema
ValidateValidate
XMatchXMatch
Workflow Changes
Cross-Match and Partition steps will be added to the workflow
Cross-match will match detections to objects
Partition will horizontally partition data, move it to slice servers, and build DPVs on main
PUBLISH
PartitionPartition
slide 24
Matching Detections with Objects
Algorithm described fully in prototype section Stored procedures to cross-match detections will be part
of the LOAD stage in loader pipeline Vertical partition of Objects table kept on load server for
matching with detections Zones cross-match algorithm used to do 1” and 2”
matches Detections with no matches saved in Orphans table
slide 25
XMatch and Partition Data Flow
Loadsupport
PS1
Pm
Detections
LoadDetections
XMatchDetections_In
PullChunk
LinkToObj_In
ObjZoneIndx
Orphans
Detections_chunk
LinkToObj_chunk
MergePartitions
Detections_m
LinkToObj_m
UpdateObjects
Objects_mPull
PartitionSwitch
Partition
Objects_m
LinkToObj_m
Objects
LinkToObj
slide 26
Detailed Design
Reuse SDSS software as much as possible Data Transformation Layer (DX) – Interface to IPP Data Loading Pipeline (DLP) Data Storage (DS)
• Schema and Test Queries• Database Management System• Scalable Data Architecture• Hardware
Query Manager (QM: CasJobs for prototype)
slide 27
Data Storage – Schema
slide 28
PS1 Table Sizes Spreadsheet
Stars 5.00E+09 1.51E+11Galaxies 5.00E+08 36750000000Total Objects 5.50E+09 m
2.3E-07 0.3*DR1 3.00P2 Detections per year 4.30E+10 0.3 0.29 0.57 0.86 1.00
tablename columns bytes/row total rows total size (TB) Prototype DR1 DR2 DR3 DR4
AltModels 0 7 1547 10 1.547E-08 1.547E-08 1.547E-08 1.547E-08 1.547E-08 1.547E-08 1 1CameraConfig 0 5 287 30 8.61E-09 8.61E-09 8.61E-09 8.61E-09 8.61E-09 8.61E-09 1 1FileGroupMap 0 4 4335 100 4.335E-07 4.335E-07 4.335E-07 4.335E-07 4.335E-07 4.335E-07 1 1IndexMap 0 7 2301 100 2.301E-07 2.301E-07 2.301E-07 2.301E-07 2.301E-07 2.301E-07 1 1Objects 0 88 420 5.50E+09 2.31 0.693 2.31 2.31 2.31 2.31 1 0.33ObjZoneIndx 0 7 63 5.50E+09 0.3465 0.10395 0.3465 0.3465 0.3465 0.3465 1 0PartitionMap 0 3 4111 100 4.111E-07 4.111E-07 4.111E-07 4.111E-07 4.111E-07 4.111E-07 1 1PhotoCal 0 10 151 1000 0.000000151 0.000000151 0.000000151 0.000000151 0.000000151 0.000000151 1 1PhotozRecipes 0 2 267 10 2.67E-09 2.67E-09 2.67E-09 2.67E-09 2.67E-09 2.67E-09 1 1SkyCells 0 2 10 50000 0.0000005 0.0000005 0.0000005 0.0000005 0.0000005 0.0000005 1 1Surveys 0 2 267 30 8.01E-09 8.01E-09 8.01E-09 8.01E-09 8.01E-09 8.01E-09 1 1DropP2ToObj 1 4 39 4.00E+06 0.000156 1.33714E-05 4.45714E-05 8.91429E-05 0.000133714 0.000156 1 0.33DropStackToObj 1 4 39 4.00E+06 0.000156 1.33714E-05 4.45714E-05 8.91429E-05 0.000133714 0.000156 1 0.33P2AltFits 1 13 71 1.51E+10 1.06855 0.09159 0.3053 0.6106 0.9159 1.06855 0 0.33P2FrameMeta 1 18 343 1.05E+06 0.00036015 0.00003087 0.0001029 0.0002058 0.0003087 0.00036015 1 1P2ImageMeta 1 64 2870 6.72E+07 0.192864 0.0165312 0.055104 0.110208 0.165312 0.192864 1 1P2PsfFits 1 34 183 1.51E+11 27.5415 2.3607 7.869 15.738 23.607 27.5415 0 0.33P2ToObj 1 3 31 1.51E+11 4.6655 0.3999 1.333 2.666 3.999 4.6655 1 0.33P2ToStack 1 2 15 1.51E+11 2.2575 0.1935 0.645 1.29 1.935 2.2575 0 0.33StackDeltaAltFits 1 13 71 3.68E+09 0.260925 0.022365 0.07455 0.1491 0.22365 0.260925 0 0.33StackHiSigDeltas 1 32 167 3.68E+10 6.13725 0.52605 1.7535 3.507 5.2605 6.13725 0 0.33StackLowSigDelta 1 2 5000 1.65E+06 0.00825 0.000707143 0.002357143 0.004714286 0.007071429 0.00825 0 0.33StackMeta 1 49 1551 700000 0.0010857 0.00032571 0.0010857 0.0010857 0.0010857 0.0010857 0 0.33StackModelFits 1 131 535 7.50E+09 4.0125 0.343928571 1.146428571 2.292857143 3.439285714 4.0125 0 0.33StackPsfFits 1 44 215 8.25E+10 17.7375 1.520357143 5.067857143 10.13571429 15.20357143 17.7375 0 0.33StackToObj 1 4 39 8.25E+10 3.2175 0.275785714 0.919285714 1.838571429 2.757857143 3.2175 1 0.33StationaryTransient 1 2 23 5.00E+08 0.0115 0.000985714 0.003285714 0.006571429 0.009857143 0.0115 1 0.33
sum 69.76959861 6.549735569 21.83244779 41.00730812 60.18216845 69.76959861indices 13.95391972 1.309947114 4.366489558 8.201461624 12.03643369 13.95391972total 83.72351833 7.859682683 26.19893735 49.20876974 72.21860214 83.72351833
0 means the table size is essentially the same for all data releases Primary filegroup1 means the table size will grow
0 means full table1 means the table is partitioned and distributed across the cluster
Fraction of the table contained on each partition
Note: These estimates are for the whole PS1, assuming 3.5 years. 7 bytes added to each row for overhead as suggested by Alex
slide 29
PS1 Table Sizes - All Servers
Table Year 1 Year 2 Year 3 Year 3.5
Objects 4.63 4.63 4.61 4.59
StackPsfFits 5.08 10.16 15.20 17.76
StackToObj 1.84 3.68 5.56 6.46
StackModelFits 1.16 2.32 3.40 3.96
P2PsfFits 7.88 15.76 23.60 27.60
P2ToObj 2.65 5.31 8.00 9.35
Other Tables 3.41 6.94 10.52 12.67
Indexes +20% 5.33 9.76 14.18 16.48
Total 31.98 58.56 85.07 98.87
Sizes are in TB
slide 30
Data Storage – Test Queries
Drawn from several sources• Initial set of SDSS 20 queries• SDSS SkyServer Sample Queries• Queries from PS scientists (Monet, Howell, Kaiser,
Heasley) Two objectives
• Find potential holes/issues in schema• Serve as test queries
– Test DBMS iintegrity– Test DBMS performance
Loaded into CasJobs (Query Manager) as sample queries for prototype
slide 31
Data Storage – DBMS
Microsoft SQL Server 2005• Relational DBMS with excellent query optimizer
Plus• Spherical/HTM (C# library + SQL glue)
– Spatial index (Hierarchical Triangular Mesh)• Zones (SQL library)
– Alternate spatial decomposition with dec zones• Many stored procedures and functions
– From coordinate conversions to neighbor search functions
• Self-extracting documentation (metadata) and diagnostics
slide 32
Documentation and Diagnostics
slide 33
Data Storage – Scalable Architecture
Monolithic database design (a la SDSS) will not do it SQL Server does not have cluster implementation
• Do it by hand Partitions vs Slices
• Partitions are file-groups on the same server– Parallelize disk accesses on the same machine
• Slices are data partitions on separate servers• We use both!
Additional slices can be added for scale-out For PS1, use SQL Server Distributed Partition Views
(DPVs)
slide 34
Distributed Partitioned Views
Difference between DPVs and file-group partitioning• FG on same database• DPVs on separate DBs• FGs are for scale-up• DPVs are for scale-out
Main server has a view of a partitioned table that includes remote partitions (we call them slices to distinguish them from FG partitions)
Accomplished with SQL Server’s linked server technology
NOT truly parallel, though
slide 35
Scalable Data Architecture
Shared-nothing architecture Detections split across cluster Objects
replicated on Head and Slice DBs
DPVs of Detections tables on the Headnode DB
Queries on Objects stay on head node
S2
S3
Head
S1
Objects_S1
Objects_S2
Objects_S3
Objects_S1
Objects_S2
Objects_S3
Detections_S1
Detections_S2
Detections_S3
Objects
Detections_S1
Detections_S2
Detections_S3
Detections DPV
Queries on detections use only local data on slices
slide 36
Hardware - Prototype
LXPS01
L1PS13
L2/MPS05
Staging Loading
10 TB 9 TB
8
4
4HeadPS11
8
DB
S1PS12
8
S2PS03
4
S3PS04
4
WPS02
4
MyDB
39 TB
2A
2A
2A
2A
A
A2B B
RAID5 RAID10 RAID10 RAID10
14D/3.5W 12D/4W
Total space
RAID config
Disk/rack config
Function
10A = 10 x [13 x 750 GB]3B = 3 x [12 x 500 GB]
LX = LinuxL = Load serverS/Head = DB serverM = MyDB serverW = Web server
Web
0 TB
PS0x = 4-corePS1x = 8-core
Server NamingConvention:
Storage:
Function:
slide 37
Hardware – PS1
Offline(Copy 2)
Spare(Copy 3)
Live(Copy 1)
Offline(Copy 2)
Spare(Copy 3)
Live(Copy 1)
Queries Ingest
Offline(Copy 1)
Spare(Copy 3)
Live(Copy 2)
Live(Copy 2)
Spare(Copy 3)
Live(Copy 1)
ReplicateQueries
Queries
Queries
Replicate
Queries
Ping-pong configuration to maintain high availability and query performance
2 copies of each slice and of main (head) node database on fast hardware (hot spares)
3rd spare copy on slow hardware (can be just disk)
Updates/ingest on offline copy then switch copies when ingest and replication finished
Synchronize second copy while first copy is online
Both copies live when no ingest
3x basic config. for PS1
slide 38
Detailed Design
Reuse SDSS software as much as possible Data Transformation Layer (DX) – Interface to IPP Data Loading Pipeline (DLP) Data Storage (DS)
• Schema and Test Queries• Database Management System• Scalable Data Architecture• Hardware
Query Manager (QM: CasJobs for prototype)
slide 39
Query Manager
Based on SDSS CasJobs Configure to work with distributed database, DPVs Direct links (contexts) to slices can be added later if
necessary Segregates quick queries from long ones Saves query results server-side in MyDB Gives users a powerful query workbench Can be scaled out to meet any query load PS1 Sample Queries available to users PS1 Prototype QM demo
slide 40
ODM Prototype Components
Data Loading Pipeline Data Storage CasJobs
• Query Manager (QM)• Web Based Interface (WBI)
Testing
slide 41
Spatial Queries (Alex)
slide 42
Prototype (Maria)
slide 43
Hardware/Scalability (Jan)
slide 44
How Design Meets Requirements
Cross-matching detections with objects• Zone cross-match part of loading pipeline• Already exceeded requirement with prototype
Query performance• Ping-pong configuration for query during ingest• Spatial indexing and distributed queries• Query manager can be scaled out as necessary
Scalability• Shared-nothing architecture• Scale out as needed• Beyond PS1 we will need truly parallel query plans
slide 45
WBS/Development Tasks
Refine Prototype/Schema
Staging/Transformation
Initial Load
Load/Resolve Detections
Resolve/Synchronize Objects
Create Snapshot
Replication Module
Query Processing
• Workflow Systems• Logging• Data Scrubbing• SSIS (?) + C#
• QM/LoggingHardware
Documentation
2 PM
3 PM
1 PM
3 PM
3 PM
1 PM
2 PM
2 PM
2 PM
2 PM
4 PM
4 PM
4 PM
2 PM
Total Effort: 35 PMDelivery: 9/2008
Testing
Redistribute Data
slide 46
Personnel Available
2 new hires (SW Engineers) 100% Maria 80% Ani 20% Jan 10% Alainna 15% Nolan Li 25% Sam Carliles 25% George Fekete 5% Laszlo Dobos 50% (for 6 months)
slide 47
Issues/Risks
Versioning• Do we need to preserve snapshots of monthly
versions?• How will users reproduce queries on subsequent
versions?• Is it ok that a new version of the sky replaces the
previous one every month? Backup/recovery
• Will we need 3 local copies rather than 2 for safety• Is restoring from offsite copy feasible?
Handoff to IfA beyond scope of WBS shown• This will involve several PMs
Mahalo!
slide 49
Context that query
is executed in
MyDB table that query results go
into
Name that this query
job is given
Check query syntax
Get graphical query plan
Run query in quick (1
minute) mode
Submit query to long (8-
hour) queue
Query buffer
Load one of the sample queries into
query buffer
Query Manager
slide 50
Stored procedure arguments
SQL code for stored procedure
Query Manager
slide 51
MyDB context is the default, but other contexts can be selected
The space used and total space available
Multiple tables can be selected and dropped at once
Table list can be sorted by name, size, type.
User can browse DB Views, Tables, Functions and
Procedures
Query Manager
slide 52
The query that created this
table
Query Manager
slide 53
Search radius
Table to hold results
Context to run search on
Query Manager