Gaia@AIP servicesgaia.ari.uni-heidelberg.de/gaia-workshop-2016/... · – SQL query assistance –...

18
Anastasia Galkin Gal Matijevic Kristin Riebe Jochen Klar Harry Enke Gaia Data Workshop, 23.11.2016 Gaia@AIP services gaia.aip.de

Transcript of Gaia@AIP servicesgaia.ari.uni-heidelberg.de/gaia-workshop-2016/... · – SQL query assistance –...

Page 1: Gaia@AIP servicesgaia.ari.uni-heidelberg.de/gaia-workshop-2016/... · – SQL query assistance – Can be used with PaQu with the Spider engine setup – Query result table viewer,

Anastasia GalkinGal Matijevic Kristin Riebe Jochen KlarHarry Enke

Gaia Data Workshop, 23.11.2016

Gaia@AIP servicesgaia.aip.de

Page 2: Gaia@AIP servicesgaia.ari.uni-heidelberg.de/gaia-workshop-2016/... · – SQL query assistance – Can be used with PaQu with the Spider engine setup – Query result table viewer,

23.11.2016

Outline

• Astrophysical data at AIP services

• The setup: Daiquiri, Paqu and the parallel MariaDB databases

• Future development

2

Page 3: Gaia@AIP servicesgaia.ari.uni-heidelberg.de/gaia-workshop-2016/... · – SQL query assistance – Can be used with PaQu with the Spider engine setup – Query result table viewer,

23.11.2016

Terabytes of astrophysical data• Observational data

– Gaia@AIP gaia.aip.de, incl. Tycho-2 and RAVE for crossmatch purposes

– RAVE – radial velocity project

– APPLAUSE – historical photometric plates

• Simulations output– CosmoSim.org: MultiDark, Bolshoi

3

Page 4: Gaia@AIP servicesgaia.ari.uni-heidelberg.de/gaia-workshop-2016/... · – SQL query assistance – Can be used with PaQu with the Spider engine setup – Query result table viewer,

23.11.2016

Why SQL?

● Very clear structured data

● Retrieve specific subset

● Need for complex queries with astrophysical functions

● Quasi-standard in astronomy

4

Page 5: Gaia@AIP servicesgaia.ari.uni-heidelberg.de/gaia-workshop-2016/... · – SQL query assistance – Can be used with PaQu with the Spider engine setup – Query result table viewer,

23.11.2016

SQL (or ADQL) is the de-facto standard in astronomy.

5

Page 6: Gaia@AIP servicesgaia.ari.uni-heidelberg.de/gaia-workshop-2016/... · – SQL query assistance – Can be used with PaQu with the Spider engine setup – Query result table viewer,

23.11.2016

It's a question of size...● Currently hosted 45TB public available data

● Number of columns: up to 60

● Number of entries:

– CosmoSim has ca. 40TB with 354 billion rows

– Gaia DR1 244GB with 2 billion rows (TGAS 2 million)

6

… and the complexity of the queries.● Selection of specific properties on billions of

rows

● Combined with specific astrophysical functions

Page 7: Gaia@AIP servicesgaia.ari.uni-heidelberg.de/gaia-workshop-2016/... · – SQL query assistance – Can be used with PaQu with the Spider engine setup – Query result table viewer,

23.11.2016

The full picture

7

Spider engine

Query queue

MariaDB MariaDB MariaDB MariaDB MariaDB MariaDB

MariaDB

SpiderFederated

PaQuDaiquiri DBIngestor / AsciiIngest

Data curatorUser

Page 8: Gaia@AIP servicesgaia.ari.uni-heidelberg.de/gaia-workshop-2016/... · – SQL query assistance – Can be used with PaQu with the Spider engine setup – Query result table viewer,

23.11.2016

Queries running in parallelMariaDB

• Data is distributed on 10 shard nodes with MyISAM engine

• orchestrated by a head node with Spider engine

• Random number plugin

PaQu

• reformulates MySQL queries– joins, aggregates, functions

– e.g. count rows ● count on each node

● sum on head node

• head node collects data via federated tables

8

Page 9: Gaia@AIP servicesgaia.ari.uni-heidelberg.de/gaia-workshop-2016/... · – SQL query assistance – Can be used with PaQu with the Spider engine setup – Query result table viewer,

23.11.2016

Daiquiri

9

A customizable framework for database publication

• SQL queries– SQL query assistance– Can be used with PaQu with the Spider engine setup– Query result table viewer, plotting tool

• UWS 1.1 complient, released 1 month ago

• User database space

• Download of data in different formats

• Project documentation via Wordpress

• Administration tools:

– Database management

– User management

– Contact messages

– Meetings management

Page 10: Gaia@AIP servicesgaia.ari.uni-heidelberg.de/gaia-workshop-2016/... · – SQL query assistance – Can be used with PaQu with the Spider engine setup – Query result table viewer,

23.11.2016

www.cosmosim.org

10

Page 11: Gaia@AIP servicesgaia.ari.uni-heidelberg.de/gaia-workshop-2016/... · – SQL query assistance – Can be used with PaQu with the Spider engine setup – Query result table viewer,

23.11.2016 11

www.cosmosim.org

www.rave-survey.org

Page 12: Gaia@AIP servicesgaia.ari.uni-heidelberg.de/gaia-workshop-2016/... · – SQL query assistance – Can be used with PaQu with the Spider engine setup – Query result table viewer,

23.11.2016

www.cosmosim.org

www.rave-survey.org

www.plate-archive.org

12

Page 13: Gaia@AIP servicesgaia.ari.uni-heidelberg.de/gaia-workshop-2016/... · – SQL query assistance – Can be used with PaQu with the Spider engine setup – Query result table viewer,

23.11.2016

www.cosmosim.org

www.rave-survey.org

www.plate-archive.org

www.4most.eu

13

Page 14: Gaia@AIP servicesgaia.ari.uni-heidelberg.de/gaia-workshop-2016/... · – SQL query assistance – Can be used with PaQu with the Spider engine setup – Query result table viewer,

23.11.2016

www.cosmosim.org

www.rave-survey.org

www.plate-archive.org

www.4most.eu

gaia.aip.de

Page 15: Gaia@AIP servicesgaia.ari.uni-heidelberg.de/gaia-workshop-2016/... · – SQL query assistance – Can be used with PaQu with the Spider engine setup – Query result table viewer,

23.11.2016

www.cosmosim.org

www.rave-survey.org

www.plate-archive.org

www.4most.eu

gaia.aip.de

CosmoDB @ Tartu

JUBILEE @ Madrid

GREGOR project

CLUES user management

Workshops: Dwarfs, 4Most

Page 16: Gaia@AIP servicesgaia.ari.uni-heidelberg.de/gaia-workshop-2016/... · – SQL query assistance – Can be used with PaQu with the Spider engine setup – Query result table viewer,

23.11.2016 16

Page 17: Gaia@AIP servicesgaia.ari.uni-heidelberg.de/gaia-workshop-2016/... · – SQL query assistance – Can be used with PaQu with the Spider engine setup – Query result table viewer,

23.11.2016

Future development

17

• Daiquiri – reloaded in python with Django framework

• Improved rewrite process for parallel queries

• Improved queue for query jobs

• Nodes daemons communication to improve the performance

• Frontend: MySQL, ADQL

• Backends: MySQL/MariaDB (sharding), Postgres

Coming up in 2017

• Deployment at AIP: newest MariaDB versions with ARIA, SPIDER and FEDERATED database engines

Page 18: Gaia@AIP servicesgaia.ari.uni-heidelberg.de/gaia-workshop-2016/... · – SQL query assistance – Can be used with PaQu with the Spider engine setup – Query result table viewer,

23.11.2016

Please create an account for the hands-on session at

gaia.aip.de

18

escience.aip.de

Anastasia Galkin [email protected]