Astronomical data curation and the Wide-Field Astronomy Unit Bob Mann Wide-Field Astronomy Unit...

15
Astronomical data curation and the Wide-Field Astronomy Unit Bob Mann Wide-Field Astronomy Unit Institute for Astronomy School of Physics University of Edinburgh ([email protected])

Transcript of Astronomical data curation and the Wide-Field Astronomy Unit Bob Mann Wide-Field Astronomy Unit...

Page 1: Astronomical data curation and the Wide-Field Astronomy Unit Bob Mann Wide-Field Astronomy Unit Institute for Astronomy School of Physics University of.

Astronomical data curation and the Wide-Field Astronomy Unit

Bob Mann

Wide-Field Astronomy UnitInstitute for Astronomy

School of PhysicsUniversity of Edinburgh

([email protected])

Page 2: Astronomical data curation and the Wide-Field Astronomy Unit Bob Mann Wide-Field Astronomy Unit Institute for Astronomy School of Physics University of.

2/15

Outline Who we are

Introduction to the Wide-Field Astronomy Unit

What we doSky survey data curation: past, present and futureData curation and the Virtual Observatory

What we could do with youWhat WFAU could do for the DCCWhat the DCC could do for WFAUQuestions

Page 3: Astronomical data curation and the Wide-Field Astronomy Unit Bob Mann Wide-Field Astronomy Unit Institute for Astronomy School of Physics University of.

3/15

Outline Who we are

Introduction to the Wide-Field Astronomy Unit

What we doSky survey data curation: past, present and futureData curation and the Virtual Observatory

What we could do with youWhat WFAU could do for the DCCWhat the DCC could do for WFAUQuestions

Page 4: Astronomical data curation and the Wide-Field Astronomy Unit Bob Mann Wide-Field Astronomy Unit Institute for Astronomy School of Physics University of.

4/15

Wide-Field Astronomy Unit Funded to curate optical and near-infrared sky survey

data for UK (and European) community Based at Royal Observatory Edinburgh

~35 years of sky survey data curation at ROE Evolving data holdings:

Photographic plates Digital scans of photographic plates Born-digital data

WFAU formed in 1999: group moved into UoE Currently 12 grant-funded + 2 academic staff

Mix of astronomers, IT professionals & hybrids

Page 5: Astronomical data curation and the Wide-Field Astronomy Unit Bob Mann Wide-Field Astronomy Unit Institute for Astronomy School of Physics University of.

5/15

Outline Who we are

Introduction to the Wide-Field Astronomy Unit

What we do Sky survey data curation: past, present and future Data curation and the Virtual Observatory

What we could do with youWhat WFAU could do for the DCCWhat the DCC could do for WFAUQuestions

Page 6: Astronomical data curation and the Wide-Field Astronomy Unit Bob Mann Wide-Field Astronomy Unit Institute for Astronomy School of Physics University of.

6/15

Sky survey data life-cycle: e.g. WFCAM Images taken at telescope

UKIRT, in Hawaii

Data reduction pipeline run in Cambridge Removes instrumental signatures Produces final, clean images Detects and characterises sources in images

Data transferred to Edinburgh Ingest source catalogues and image metadata into

relational database, store image files on disk Combine data from multiple nights: new images, cats. Publish release databases via web interface

On pernight basis

Page 7: Astronomical data curation and the Wide-Field Astronomy Unit Bob Mann Wide-Field Astronomy Unit Institute for Astronomy School of Physics University of.

7/15

WFAU’s main survey archives Past: SuperCOSMOS

Based on digital scans of photographic plates Database: ~5TB: largest tables ~109 rows Images: ~35,000 user requests (10GB) per month

Present (2005-2012): WFCAM Near-infrared: ~700 registered users ~500 million rows of database results per month ~125GB of flat file image data per month

Near-future (2008-2020): VISTA ~3 x data rates/volume of WFCAM

Page 8: Astronomical data curation and the Wide-Field Astronomy Unit Bob Mann Wide-Field Astronomy Unit Institute for Astronomy School of Physics University of.

8/15

WFAU’s future plans Large Synoptic Survey Telescope US-led public/private project

We’re trying to get UK to buy into it

Data challenges immense WFCAM takes ~20TB of image data per year LSST will take ~20TB of image data per night:

~60PB images, ~8PB database (2016-2025)

LSST stimulating a lot of data management R&D in the US: Commercial: Google Academic: “Sci-DB” (M. Stonebraker, D. DeWitt)

Page 9: Astronomical data curation and the Wide-Field Astronomy Unit Bob Mann Wide-Field Astronomy Unit Institute for Astronomy School of Physics University of.

9/15

The Virtual Observatory Goal: an interoperable federation of all the

world’s astronomical data resources

International Virtual Observatory Alliance Coordinates VO development worldwide Acts as W3C-like standards body for the VO

AstroGrid: Only project to have developed a full VO system

Page 10: Astronomical data curation and the Wide-Field Astronomy Unit Bob Mann Wide-Field Astronomy Unit Institute for Astronomy School of Physics University of.

10/15

Virtual Observatory components Registry

Metadata for all data published to the VO

Standard data access protocols For tabular data, images, spectra, time series, etc

Standard web service wrappers for application code Enabling asynchronous calls, workflow, etc

Distributed data storage system Presenting transparent aggregated logical view to user

Page 11: Astronomical data curation and the Wide-Field Astronomy Unit Bob Mann Wide-Field Astronomy Unit Institute for Astronomy School of Physics University of.

11/15

Curation challenges for WFAU More data analysis services in the data centre

Data volumes too large for user download WFAU must provide data analysis services & hardware

Integration of data and knowledge Third-party annotations which can be used in queries

“Object X in database Y is a quasar” “X-ray source A is the same object as radio source B”

Better linkage between archives and online literature

Keeping staff up to date on technologies/techniques Mostly learn by doing – do we make best choices?

Page 12: Astronomical data curation and the Wide-Field Astronomy Unit Bob Mann Wide-Field Astronomy Unit Institute for Astronomy School of Physics University of.

12/15

Outline Who we are

Introduction to the Wide-Field Astronomy Unit

What we doSky survey data curation: past, present and futureData curation and the Virtual Observatory

What we could do with you What WFAU could do for the DCC What the DCC could do for WFAU Questions

Page 13: Astronomical data curation and the Wide-Field Astronomy Unit Bob Mann Wide-Field Astronomy Unit Institute for Astronomy School of Physics University of.

13/15

WFAU and DCC:What we can do for you Case studies, exemplars, etc

WFAU is a well-established, competent group Astronomy is a relatively small, cohesive community,

used to interdisciplinary collaboration Astronomers are early adopters of IT and recognise

value of data curation VO is a rich, functional e-Science infrastructure

Collaborations to date: Raj Bose – distributed annotation service James Cheney – paper on data centre security

Page 14: Astronomical data curation and the Wide-Field Astronomy Unit Bob Mann Wide-Field Astronomy Unit Institute for Astronomy School of Physics University of.

14/15

WFAU and DCC:What you can do for us Policy advice

Increasingly need to convince research councils of benefits of long term data curation – cost/benefit

Technical advice – from DCC or its Associates Should we use iRODS for LSST? Do any XML databases have decent performance? Do the VO metadata standards make sense?

Curation manual When will the rest appear?

Training e.g. NeSC course on relational database design

Page 15: Astronomical data curation and the Wide-Field Astronomy Unit Bob Mann Wide-Field Astronomy Unit Institute for Astronomy School of Physics University of.

15/15

WFAU and DCC:Questions What is the DCC’s model for collaboration?

Can’t collaborate with everyone on everything

Scientists & digital librarians live in different worlds: how do you bridge that divide? Interdisciplinary work requires sustained interaction

What do you want from scientific data curators? What can you offer us in return?

Few of my colleagues know anything about the DCC Does that surprise you?