Big Data in Science (Lessons from astrophysics) Michael Drinkwater, UQ & CAASTRO 1.Preface...
-
Upload
rosalyn-haynes -
Category
Documents
-
view
213 -
download
0
Transcript of Big Data in Science (Lessons from astrophysics) Michael Drinkwater, UQ & CAASTRO 1.Preface...
![Page 1: Big Data in Science (Lessons from astrophysics) Michael Drinkwater, UQ & CAASTRO 1.Preface Contributions by Jim Grey Astronomy data flow 2.Past Glories.](https://reader035.fdocuments.in/reader035/viewer/2022062804/56649e665503460f94b6185f/html5/thumbnails/1.jpg)
Big Data in Science(Lessons from astrophysics)
Michael Drinkwater, UQ & CAASTRO
1. Preface Contributions by Jim Grey
Astronomy data flow
2. Past Glories Why it was easy to be world-leading
3. Future challenges Why really big data makes us worry!
CSIRO Parkes radio telescope
![Page 2: Big Data in Science (Lessons from astrophysics) Michael Drinkwater, UQ & CAASTRO 1.Preface Contributions by Jim Grey Astronomy data flow 2.Past Glories.](https://reader035.fdocuments.in/reader035/viewer/2022062804/56649e665503460f94b6185f/html5/thumbnails/2.jpg)
1. Preface: Jim Grey (Microsoft eScience)
› Much of what I discuss was already said by the late Jim Grey:
› “I have been hanging out with astronomers for about the last 10 years… I look at their telescopes… $15-20M worth of capital equipment with about 20-50 people operating the instrument… millions of lines of code are needed to analyse all this information. In fact the software cost dominates the capital expenditure!”
› Jim Grey on eScience, in The Fourth Paradigm, eds Hey, Tansley & Tolle, 2009. (emphasis added)
research.microsoft.com
Jim Grey,Microsoft Research
![Page 3: Big Data in Science (Lessons from astrophysics) Michael Drinkwater, UQ & CAASTRO 1.Preface Contributions by Jim Grey Astronomy data flow 2.Past Glories.](https://reader035.fdocuments.in/reader035/viewer/2022062804/56649e665503460f94b6185f/html5/thumbnails/3.jpg)
1. Preface: Astronomy Data Flow
Telescope Raw Images Output Image
Science Database Catalogues
![Page 4: Big Data in Science (Lessons from astrophysics) Michael Drinkwater, UQ & CAASTRO 1.Preface Contributions by Jim Grey Astronomy data flow 2.Past Glories.](https://reader035.fdocuments.in/reader035/viewer/2022062804/56649e665503460f94b6185f/html5/thumbnails/4.jpg)
2. Past Glories
› 20 years ago
- Easy to lead the world!
› UKST photographic all sky survey
- 1 image = 1 GB
- All-sky image = 1 TB
- All-sky catalogue = 100 MB
- Put online with two summer student projects
![Page 5: Big Data in Science (Lessons from astrophysics) Michael Drinkwater, UQ & CAASTRO 1.Preface Contributions by Jim Grey Astronomy data flow 2.Past Glories.](https://reader035.fdocuments.in/reader035/viewer/2022062804/56649e665503460f94b6185f/html5/thumbnails/5.jpg)
2. Past Glories
› Why did astronomy lead the way with (old) big data?
› 1) Telescopes are expensive so only a few data sources
- Data complex so only a few software packages, especially for national projects
- => easy to adopt a common data file format
› 2) Astronomers had strong computing skills
- => easy to search relatively large discovery space
CSIRO's ASKAP radio telescope with its innovative phased array receiver technology. (Image: Dragonfly Media)
![Page 6: Big Data in Science (Lessons from astrophysics) Michael Drinkwater, UQ & CAASTRO 1.Preface Contributions by Jim Grey Astronomy data flow 2.Past Glories.](https://reader035.fdocuments.in/reader035/viewer/2022062804/56649e665503460f94b6185f/html5/thumbnails/6.jpg)
2. Past Glories
› Problems with the old approach in astronomy
- Most team projects underestimate or ignore database budget
- Astronomers too independent – skeptical of computer science expertise
- Bespoke solutions not scalable or sustainable
The Anglo-Australian Telescope (Image: AAO) – used for many team projects
![Page 7: Big Data in Science (Lessons from astrophysics) Michael Drinkwater, UQ & CAASTRO 1.Preface Contributions by Jim Grey Astronomy data flow 2.Past Glories.](https://reader035.fdocuments.in/reader035/viewer/2022062804/56649e665503460f94b6185f/html5/thumbnails/7.jpg)
2. Past Glories
› WiggleZ Dark Energy Survey
- 5 year observing project
- $5M facility time + $1.5M grants + 20 team salaries
- Database $40k (donated by host as not funded)
› Success!
- 4 tests proving Einstein’s General Relativity correct
- Many other results
- 1425 citations
› Failure!
- Database failed as not supported
![Page 8: Big Data in Science (Lessons from astrophysics) Michael Drinkwater, UQ & CAASTRO 1.Preface Contributions by Jim Grey Astronomy data flow 2.Past Glories.](https://reader035.fdocuments.in/reader035/viewer/2022062804/56649e665503460f94b6185f/html5/thumbnails/8.jpg)
3. Future Challenges
› New projects so large astronomy must change…
- 1995 Schmidt photographic survey: 1 TB
- 2006 Sloan Digital Sky Survey: 25 TB
- …
- 2022-32 Large Synoptic Survey Telescope 130 PB in 10 years
- 2030-? Square Kilometre Array radio telescope: 10 PB per day!
- More data per day than entire internet per year
The LSST: 8.4 m telescope mirror, 3.2Gpixel camera
![Page 9: Big Data in Science (Lessons from astrophysics) Michael Drinkwater, UQ & CAASTRO 1.Preface Contributions by Jim Grey Astronomy data flow 2.Past Glories.](https://reader035.fdocuments.in/reader035/viewer/2022062804/56649e665503460f94b6185f/html5/thumbnails/9.jpg)
3. Future Challenges
› Challenges we know how to solve (Jim Gray predicted most of these)
- Realistic funding
- Scalable database structure: how to avoid i/o limits
- Must move the query to the data
- Efficient database design (Jim’s 20 questions to define functionality)
![Page 10: Big Data in Science (Lessons from astrophysics) Michael Drinkwater, UQ & CAASTRO 1.Preface Contributions by Jim Grey Astronomy data flow 2.Past Glories.](https://reader035.fdocuments.in/reader035/viewer/2022062804/56649e665503460f94b6185f/html5/thumbnails/10.jpg)
3. Future Challenges
› Nasty challenges we are yet to solve…
- Complex data mining way beyond SQL
- “Teaching software engineering to the whole community”1
- Real-time analysis for transient events
- Cross-matching different large databases in different locations
“The data collected by the SKA in a single day would take nearly two million years to play back on an iPod.” skatelescop.org
1. Mario Juric, LSST Data Management Project Scientist
![Page 11: Big Data in Science (Lessons from astrophysics) Michael Drinkwater, UQ & CAASTRO 1.Preface Contributions by Jim Grey Astronomy data flow 2.Past Glories.](https://reader035.fdocuments.in/reader035/viewer/2022062804/56649e665503460f94b6185f/html5/thumbnails/11.jpg)
Postscript: Jim Grey (Microsoft eScience)
› Jim Gray’s rules for large data design:
- Scientific computing is increasingly data intensive
- Solution is a “scale-out” architecture
- Bring computations to the data, rather than data to the computations
- Start the design with the 20 top questions
- Go from "working to working"
- From “Gray’s Laws: Database-centric Computing in Science”, Szalay & Blakeley, , in The Fourth Paradigm, eds Hey, Tansley & Tolle, 2009.
research.microsoft.com
Jim Grey,Microsoft Research