Astronomical data curation and the Wide-Field Astronomy Unit
Bob Mann
Wide-Field Astronomy UnitInstitute for Astronomy
School of PhysicsUniversity of Edinburgh
2/15
Outline Who we are
Introduction to the Wide-Field Astronomy Unit
What we doSky survey data curation: past, present and futureData curation and the Virtual Observatory
What we could do with youWhat WFAU could do for the DCCWhat the DCC could do for WFAUQuestions
3/15
Outline Who we are
Introduction to the Wide-Field Astronomy Unit
What we doSky survey data curation: past, present and futureData curation and the Virtual Observatory
What we could do with youWhat WFAU could do for the DCCWhat the DCC could do for WFAUQuestions
4/15
Wide-Field Astronomy Unit Funded to curate optical and near-infrared sky survey
data for UK (and European) community Based at Royal Observatory Edinburgh
~35 years of sky survey data curation at ROE Evolving data holdings:
Photographic plates Digital scans of photographic plates Born-digital data
WFAU formed in 1999: group moved into UoE Currently 12 grant-funded + 2 academic staff
Mix of astronomers, IT professionals & hybrids
5/15
Outline Who we are
Introduction to the Wide-Field Astronomy Unit
What we do Sky survey data curation: past, present and future Data curation and the Virtual Observatory
What we could do with youWhat WFAU could do for the DCCWhat the DCC could do for WFAUQuestions
6/15
Sky survey data life-cycle: e.g. WFCAM Images taken at telescope
UKIRT, in Hawaii
Data reduction pipeline run in Cambridge Removes instrumental signatures Produces final, clean images Detects and characterises sources in images
Data transferred to Edinburgh Ingest source catalogues and image metadata into
relational database, store image files on disk Combine data from multiple nights: new images, cats. Publish release databases via web interface
On pernight basis
7/15
WFAU’s main survey archives Past: SuperCOSMOS
Based on digital scans of photographic plates Database: ~5TB: largest tables ~109 rows Images: ~35,000 user requests (10GB) per month
Present (2005-2012): WFCAM Near-infrared: ~700 registered users ~500 million rows of database results per month ~125GB of flat file image data per month
Near-future (2008-2020): VISTA ~3 x data rates/volume of WFCAM
8/15
WFAU’s future plans Large Synoptic Survey Telescope US-led public/private project
We’re trying to get UK to buy into it
Data challenges immense WFCAM takes ~20TB of image data per year LSST will take ~20TB of image data per night:
~60PB images, ~8PB database (2016-2025)
LSST stimulating a lot of data management R&D in the US: Commercial: Google Academic: “Sci-DB” (M. Stonebraker, D. DeWitt)
9/15
The Virtual Observatory Goal: an interoperable federation of all the
world’s astronomical data resources
International Virtual Observatory Alliance Coordinates VO development worldwide Acts as W3C-like standards body for the VO
AstroGrid: Only project to have developed a full VO system
10/15
Virtual Observatory components Registry
Metadata for all data published to the VO
Standard data access protocols For tabular data, images, spectra, time series, etc
Standard web service wrappers for application code Enabling asynchronous calls, workflow, etc
Distributed data storage system Presenting transparent aggregated logical view to user
11/15
Curation challenges for WFAU More data analysis services in the data centre
Data volumes too large for user download WFAU must provide data analysis services & hardware
Integration of data and knowledge Third-party annotations which can be used in queries
“Object X in database Y is a quasar” “X-ray source A is the same object as radio source B”
Better linkage between archives and online literature
Keeping staff up to date on technologies/techniques Mostly learn by doing – do we make best choices?
12/15
Outline Who we are
Introduction to the Wide-Field Astronomy Unit
What we doSky survey data curation: past, present and futureData curation and the Virtual Observatory
What we could do with you What WFAU could do for the DCC What the DCC could do for WFAU Questions
13/15
WFAU and DCC:What we can do for you Case studies, exemplars, etc
WFAU is a well-established, competent group Astronomy is a relatively small, cohesive community,
used to interdisciplinary collaboration Astronomers are early adopters of IT and recognise
value of data curation VO is a rich, functional e-Science infrastructure
Collaborations to date: Raj Bose – distributed annotation service James Cheney – paper on data centre security
14/15
WFAU and DCC:What you can do for us Policy advice
Increasingly need to convince research councils of benefits of long term data curation – cost/benefit
Technical advice – from DCC or its Associates Should we use iRODS for LSST? Do any XML databases have decent performance? Do the VO metadata standards make sense?
Curation manual When will the rest appear?
Training e.g. NeSC course on relational database design
15/15
WFAU and DCC:Questions What is the DCC’s model for collaboration?
Can’t collaborate with everyone on everything
Scientists & digital librarians live in different worlds: how do you bridge that divide? Interdisciplinary work requires sustained interaction
What do you want from scientific data curators? What can you offer us in return?
Few of my colleagues know anything about the DCC Does that surprise you?
Top Related