From Sky to Earth: Data Science Methodology...
Transcript of From Sky to Earth: Data Science Methodology...
![Page 1: From Sky to Earth: Data Science Methodology Transferdame.dsf.unina.it/astroinformatics2016/lectures/Mahabal_IAU325.pdf · From Sky to Earth: Data Science Methodology Transfer ...](https://reader031.fdocuments.in/reader031/viewer/2022022511/5ae11f337f8b9a6e5c8e4c7e/html5/thumbnails/1.jpg)
Ashish Mahabal aam at astro.caltech.edu
Center for Data Driven Discovery, Caltech IAU 325: AstroInformatics, Sorrento, Italy
2016-10-23
From Sky to Earth: Data Science Methodology Transfer
JPL Data Science InitiativeNASA Advanced Information Systems Technology Program (AIST)
Western States Water Architecture Study
EarthCubeVIFI
![Page 2: From Sky to Earth: Data Science Methodology Transferdame.dsf.unina.it/astroinformatics2016/lectures/Mahabal_IAU325.pdf · From Sky to Earth: Data Science Methodology Transfer ...](https://reader031.fdocuments.in/reader031/viewer/2022022511/5ae11f337f8b9a6e5c8e4c7e/html5/thumbnails/2.jpg)
Broad Outline
• Similarities in Big data of Astro and Earth Sciences
• The Hydrology case
• Example projects from BigSkyEarth
• EarthCube - the Earth VO
• Domain Adaptation
• Summary
2
![Page 3: From Sky to Earth: Data Science Methodology Transferdame.dsf.unina.it/astroinformatics2016/lectures/Mahabal_IAU325.pdf · From Sky to Earth: Data Science Methodology Transfer ...](https://reader031.fdocuments.in/reader031/viewer/2022022511/5ae11f337f8b9a6e5c8e4c7e/html5/thumbnails/3.jpg)
Generic Big Data• Complex rather than just voluminous
• Real-time needs
• Complexity in terms of
• spatial distribution
• spatial and temporal resolution,
• time epochs (number of and irregularity),
• coverage (overlap)
Volume, Velocity, Volatility, Veracity, Value, …
3
![Page 4: From Sky to Earth: Data Science Methodology Transferdame.dsf.unina.it/astroinformatics2016/lectures/Mahabal_IAU325.pdf · From Sky to Earth: Data Science Methodology Transfer ...](https://reader031.fdocuments.in/reader031/viewer/2022022511/5ae11f337f8b9a6e5c8e4c7e/html5/thumbnails/4.jpg)
0 1
24
22
20
18
16
14
12
10
−8
−6
V838 MonM85 OT
M31 RV
SCP06F6
SN2006gySN2005ap SN2008es
SN2007bi
SN2008S
NGC300OT
SN2008ha
SN2005E
SN2002bj
PTF10iuvPTF09dav
PTF11bijPTF10bhp
PTF10fqs
PTF10acbp
PTF09atuPTF09cnd
PTF09cwlPTF10cwr
Thermonuclear Supernovae
Classical Novae
Luminous Red
Novae
Core−Collapse Supernovae
Luminous Supernovae
.Ia Explosions
Ca−rich Transients
P60−M81OT−071213
P60−M82OT−081119
0 1
24
22
20
18
16
14
12
10
8
6
V838 MonM85 OT
M31 RV
SCP06F6
SN2006gySN2005ap SN2008es
SN2007bi
SN2008S
NGC300OT
SN2008ha
SN2005E
SN2002bj
PTF10iuvPTF09dav
PTF11bijPTF10bhp
PTF10fqs
PTF10acbp
PTF09atuPTF09cnd
PTF09cwlPTF10cwr
Thermonuclear Supernovae
Classical Novae
Luminous Red
Novae
Core−Collapse Supernovae
Luminous Supernovae
.Ia Explosions
Ca−rich Transients
P60−M81OT−071213
P60−M82OT−081119 M85 OT
1038
1039
1040
1041
1042
1043
1044
1045
Peak
Lum
inos
ity [e
rg s−
1 ]
−24
−22
−20
−18
−16
−14
−12
−10
−8
−6
Peak
Lum
inos
ity [M
V]
10 10 10Characteristic Timescale [day]
0
log ( [sec]) 10 10
Characteristic Timescale [day]1 2 3 4 5 6 7
A
A
A
. A BB D
BA B A
C .
C
0 1 2
24
22
20
18
16
14
12
10
8
6
V838 MonM85 OT
M31 RV
SCP06F6
SN2006gySN2005ap SN2008es
SN2007bi
SN2008S
NGC300OT
SN2008ha
SN2005E
SN2002bj
PTF10iuvPTF09dav
PTF11bijPTF10bhp
PTF10fqs
PTF10acbp
PTF09atuPTF09cnd
PTF09cwlPTF10cwr
Thermonuclear Supernovae
Classical Novae
Luminous Red
Novae
Core−Collapse Supernovae
Luminous Supernovae
.Ia Explosions
Ca−rich Transients
P60−M81OT−071213
P60−M82OT−081119
10
10
10
10
10
10
10
10
- B C BA
- BA
B A
AB
2 1 log ( )
-1 -2 -3 -4
A B A ) (
Big Data - Astronomy• Complex rather than just voluminous (catalogs, spectra, polarimetry)
• Real-time needs (e.g. transient classification)
• Understanding in terms of existing models (e.g. Tabby’s star, HB stars)
• Complexity in terms of
• spatial distribution (data archives at different locations)
• spatial and temporal resolution (HST~0”.1 -> TESS~10”)
• time epochs (number of and irregularity) (SDSS - Kepler)
• coverage (overlap) (DLS -> Gaia)
J Cooke4
![Page 5: From Sky to Earth: Data Science Methodology Transferdame.dsf.unina.it/astroinformatics2016/lectures/Mahabal_IAU325.pdf · From Sky to Earth: Data Science Methodology Transfer ...](https://reader031.fdocuments.in/reader031/viewer/2022022511/5ae11f337f8b9a6e5c8e4c7e/html5/thumbnails/5.jpg)
Big Data - Earth Science• In-situ measurements
• Satellite-based observations
• Models (predictive, computational)
• Real-time needs (e.g. predicting flashfloods, ephemeral water flow)
• Complexity in terms of
• spatial and temporal resolution (wells, snow, moisture, underground water, overground water)
• time epochs (number of and irregularity) (snow, wells)
• coverage (overlap)
5
![Page 6: From Sky to Earth: Data Science Methodology Transferdame.dsf.unina.it/astroinformatics2016/lectures/Mahabal_IAU325.pdf · From Sky to Earth: Data Science Methodology Transfer ...](https://reader031.fdocuments.in/reader031/viewer/2022022511/5ae11f337f8b9a6e5c8e4c7e/html5/thumbnails/6.jpg)
Big Data - Earth Science
Python, R, GrADS, IDL, Matlab, ArcGIS, HydroDesktop, and Google’s Earth Engine.
Tools
Multi-dimensional Indexing GeoMesa GeoWave
6
![Page 7: From Sky to Earth: Data Science Methodology Transferdame.dsf.unina.it/astroinformatics2016/lectures/Mahabal_IAU325.pdf · From Sky to Earth: Data Science Methodology Transfer ...](https://reader031.fdocuments.in/reader031/viewer/2022022511/5ae11f337f8b9a6e5c8e4c7e/html5/thumbnails/7.jpg)
Ontologies
Astronomical Objects
PDSSteve HughesDan Crichton
PDS -> Earth Science (NASA)7
![Page 8: From Sky to Earth: Data Science Methodology Transferdame.dsf.unina.it/astroinformatics2016/lectures/Mahabal_IAU325.pdf · From Sky to Earth: Data Science Methodology Transfer ...](https://reader031.fdocuments.in/reader031/viewer/2022022511/5ae11f337f8b9a6e5c8e4c7e/html5/thumbnails/8.jpg)
Multiplicity of ontologies
Meta-data (and ontologies) are good Too many, or non-confirming systems may be hurtful ONTOLOG, EarthCube, OGC, SWEET
8
![Page 9: From Sky to Earth: Data Science Methodology Transferdame.dsf.unina.it/astroinformatics2016/lectures/Mahabal_IAU325.pdf · From Sky to Earth: Data Science Methodology Transfer ...](https://reader031.fdocuments.in/reader031/viewer/2022022511/5ae11f337f8b9a6e5c8e4c7e/html5/thumbnails/9.jpg)
Parallels in Earth Science and astronomy methodology
• Water vapour
• Precipitation
• Surface Water
• Ground Water
• Snow
• Evaporation
• Rivers/Lakes/…
The Hydrology Case
9
![Page 10: From Sky to Earth: Data Science Methodology Transferdame.dsf.unina.it/astroinformatics2016/lectures/Mahabal_IAU325.pdf · From Sky to Earth: Data Science Methodology Transfer ...](https://reader031.fdocuments.in/reader031/viewer/2022022511/5ae11f337f8b9a6e5c8e4c7e/html5/thumbnails/10.jpg)
GRACE AQUA
Next few slides from ARSET10
![Page 11: From Sky to Earth: Data Science Methodology Transferdame.dsf.unina.it/astroinformatics2016/lectures/Mahabal_IAU325.pdf · From Sky to Earth: Data Science Methodology Transfer ...](https://reader031.fdocuments.in/reader031/viewer/2022022511/5ae11f337f8b9a6e5c8e4c7e/html5/thumbnails/11.jpg)
11
![Page 12: From Sky to Earth: Data Science Methodology Transferdame.dsf.unina.it/astroinformatics2016/lectures/Mahabal_IAU325.pdf · From Sky to Earth: Data Science Methodology Transfer ...](https://reader031.fdocuments.in/reader031/viewer/2022022511/5ae11f337f8b9a6e5c8e4c7e/html5/thumbnails/12.jpg)
Less than 1 and up to two measurements per day12
![Page 13: From Sky to Earth: Data Science Methodology Transferdame.dsf.unina.it/astroinformatics2016/lectures/Mahabal_IAU325.pdf · From Sky to Earth: Data Science Methodology Transfer ...](https://reader031.fdocuments.in/reader031/viewer/2022022511/5ae11f337f8b9a6e5c8e4c7e/html5/thumbnails/13.jpg)
Data latency under 3 hours to 3 monthsTESS will have downlinks every 15 days13
![Page 14: From Sky to Earth: Data Science Methodology Transferdame.dsf.unina.it/astroinformatics2016/lectures/Mahabal_IAU325.pdf · From Sky to Earth: Data Science Methodology Transfer ...](https://reader031.fdocuments.in/reader031/viewer/2022022511/5ae11f337f8b9a6e5c8e4c7e/html5/thumbnails/14.jpg)
Very distributed and not talking enough to each other
14
![Page 15: From Sky to Earth: Data Science Methodology Transferdame.dsf.unina.it/astroinformatics2016/lectures/Mahabal_IAU325.pdf · From Sky to Earth: Data Science Methodology Transfer ...](https://reader031.fdocuments.in/reader031/viewer/2022022511/5ae11f337f8b9a6e5c8e4c7e/html5/thumbnails/15.jpg)
Water
JPL Data Science InitiativeNASA Advanced Information Systems Technology Program (AIST) Western States Water Architecture Study
Input&Forcing-(e.g.,-GPM)-
For-Data-Assimila<on-(e.g.,-MODSCAG)-
Standard-Reports- Ad-Hoc-Queries-and-Custom-Reports-
Snow&Water-Equivalent- Surface-Water- Ground-Water-
Single&Month-Es<mates- Short-and-Long&Term-Trends-
Research(
Applica-ons(
Decision(Support(
Data(Science(Infrastructure((Tools,(Services,(Methods(for(Massive(Data(Analysis)(
A(Scalable(Data(Processing(System(for(Hydrological(Science(
(Web&Based-Interface)-
15
![Page 16: From Sky to Earth: Data Science Methodology Transferdame.dsf.unina.it/astroinformatics2016/lectures/Mahabal_IAU325.pdf · From Sky to Earth: Data Science Methodology Transfer ...](https://reader031.fdocuments.in/reader031/viewer/2022022511/5ae11f337f8b9a6e5c8e4c7e/html5/thumbnails/16.jpg)
Western States Water Mission (WSWM)
hydrological state estimation on water availability, at 3km2 resolution for the Western US
timely actionable information
a close collaboration of hydrological modeling and data science expertise in a mission-style project architecture
WaterTrek: an interactive, web-based interactive analytics environment
Regularization of spatial resolution Time series regularization
Integration of datasets16
![Page 17: From Sky to Earth: Data Science Methodology Transferdame.dsf.unina.it/astroinformatics2016/lectures/Mahabal_IAU325.pdf · From Sky to Earth: Data Science Methodology Transfer ...](https://reader031.fdocuments.in/reader031/viewer/2022022511/5ae11f337f8b9a6e5c8e4c7e/html5/thumbnails/17.jpg)
WSWM
Pacific Northwest
California
Great Basin
Lower Colorado
Upper Colorado
WSWM domain: Continental US west of divide
17
![Page 18: From Sky to Earth: Data Science Methodology Transferdame.dsf.unina.it/astroinformatics2016/lectures/Mahabal_IAU325.pdf · From Sky to Earth: Data Science Methodology Transfer ...](https://reader031.fdocuments.in/reader031/viewer/2022022511/5ae11f337f8b9a6e5c8e4c7e/html5/thumbnails/18.jpg)
WSWM
Pacific Northwest
California
Great Basin
Lower Colorado
Upper Colorado
WSWM domain: Continental US west of divide
Franklin D Roosevelt Lake
Lake Koocanusa
Shasta Lake
Lake Mead
Lake Powell
Contains 5 of the 15 largest US reservoirs
18
![Page 19: From Sky to Earth: Data Science Methodology Transferdame.dsf.unina.it/astroinformatics2016/lectures/Mahabal_IAU325.pdf · From Sky to Earth: Data Science Methodology Transfer ...](https://reader031.fdocuments.in/reader031/viewer/2022022511/5ae11f337f8b9a6e5c8e4c7e/html5/thumbnails/19.jpg)
WSWM
Pacific Northwest
California
Great Basin
Lower Colorado
Upper Colorado
WSWM domain: Continental US west of divide
Franklin D Roosevelt Lake
Lake Koocanusa
Shasta Lake
Lake Mead
Lake Powell
Contains 5 of the 15 largest US reservoirs
Getting ready for SWOT
Actual model resolution
Largest rivers
19
![Page 20: From Sky to Earth: Data Science Methodology Transferdame.dsf.unina.it/astroinformatics2016/lectures/Mahabal_IAU325.pdf · From Sky to Earth: Data Science Methodology Transfer ...](https://reader031.fdocuments.in/reader031/viewer/2022022511/5ae11f337f8b9a6e5c8e4c7e/html5/thumbnails/20.jpg)
WSWM
Pacific Northwest
California
Great Basin
Lower Colorado
Upper Colorado
WSWM domain: Continental US west of divide
Franklin D Roosevelt Lake
Lake Koocanusa
Shasta Lake
Lake Mead
Lake Powell
Contains 5 of the 15 largest US reservoirs
Getting ready for SWOT
Actual model resolution
Largest rivers
658,702 river reaches (1,410,328 total length) 7,532 gauges (many now inactive)
Hyper-resolution with assimilation
20
![Page 21: From Sky to Earth: Data Science Methodology Transferdame.dsf.unina.it/astroinformatics2016/lectures/Mahabal_IAU325.pdf · From Sky to Earth: Data Science Methodology Transfer ...](https://reader031.fdocuments.in/reader031/viewer/2022022511/5ae11f337f8b9a6e5c8e4c7e/html5/thumbnails/21.jpg)
WSWM
Pacific Northwest
California
Great Basin
Lower Colorado
Upper Colorado
WSWM domain: Continental US west of divide
Franklin D Roosevelt Lake
Lake Koocanusa
Shasta Lake
Lake Mead
Lake Powell
Contains 5 of the 15 largest US reservoirs
Getting ready for SWOT
Actual model resolution
Largest rivers
658,702 river reaches (1,410,328 total length) 7,532 gauges (many now inactive)
Hyper-resolution with assimilation
Facilitates informed decisions at the local level
High-resolution modeling over large spatial domain
21
![Page 22: From Sky to Earth: Data Science Methodology Transferdame.dsf.unina.it/astroinformatics2016/lectures/Mahabal_IAU325.pdf · From Sky to Earth: Data Science Methodology Transfer ...](https://reader031.fdocuments.in/reader031/viewer/2022022511/5ae11f337f8b9a6e5c8e4c7e/html5/thumbnails/22.jpg)
High Level Concept of Data Management and Data Analytics
22
![Page 23: From Sky to Earth: Data Science Methodology Transferdame.dsf.unina.it/astroinformatics2016/lectures/Mahabal_IAU325.pdf · From Sky to Earth: Data Science Methodology Transfer ...](https://reader031.fdocuments.in/reader031/viewer/2022022511/5ae11f337f8b9a6e5c8e4c7e/html5/thumbnails/23.jpg)
COST’s
First Training School at Oberpfaffenhofen 2016
https://github.com/marcoq/BSE_TS2016_Oberpfaffenhofen/
23
![Page 24: From Sky to Earth: Data Science Methodology Transferdame.dsf.unina.it/astroinformatics2016/lectures/Mahabal_IAU325.pdf · From Sky to Earth: Data Science Methodology Transfer ...](https://reader031.fdocuments.in/reader031/viewer/2022022511/5ae11f337f8b9a6e5c8e4c7e/html5/thumbnails/24.jpg)
EarthCubeNSF 2011
Cyberinfrastructure sharing
visualization analysis
Interoperability standards better integration
democratizing dataJPL, CaltechScalable Arch
Test Environment24
![Page 25: From Sky to Earth: Data Science Methodology Transferdame.dsf.unina.it/astroinformatics2016/lectures/Mahabal_IAU325.pdf · From Sky to Earth: Data Science Methodology Transfer ...](https://reader031.fdocuments.in/reader031/viewer/2022022511/5ae11f337f8b9a6e5c8e4c7e/html5/thumbnails/25.jpg)
• BCube: Broker for Next generation Geoscience (meditating interactions)
• Integrating Long-Tail Data and Models
• Scalable Community Driven Architecture
• (SG Djorgovski, E Law, D Crichton, A Mahabal)
• ECITE (Graves, Yang, Law, Djorgovski, Mahabal)
• … (other Building Blocks)
EarthCubeFunded Projects
25
![Page 26: From Sky to Earth: Data Science Methodology Transferdame.dsf.unina.it/astroinformatics2016/lectures/Mahabal_IAU325.pdf · From Sky to Earth: Data Science Methodology Transfer ...](https://reader031.fdocuments.in/reader031/viewer/2022022511/5ae11f337f8b9a6e5c8e4c7e/html5/thumbnails/26.jpg)
Scalable Community Driven Architecture
• Identify Stakeholders, key use cases
• Incorporate cross-agency informatics efforts to capture architectural drivers, principles, models
• Roadmap for extensible and sustainable participation coherent with cyberinfrastructure
• Design architecture, data intensive system leading to discovery in the big data era
Team: S.Caltagirone, D.Crichton, S.G.Djorgovski, T.Huang, S.Hughes, E.Law, A.Mahabal, D.Pilone, T.Pilone
26
![Page 27: From Sky to Earth: Data Science Methodology Transferdame.dsf.unina.it/astroinformatics2016/lectures/Mahabal_IAU325.pdf · From Sky to Earth: Data Science Methodology Transfer ...](https://reader031.fdocuments.in/reader031/viewer/2022022511/5ae11f337f8b9a6e5c8e4c7e/html5/thumbnails/27.jpg)
EarthCube Integration and Test Environment (ECITE)
• Seamless federeated system of scalable, location independent resources
• Compute and storage with minimal administration
• Integration, test, and evaluation
• Share ideas, concepts, experiments
SarvabhaumMandlik
Caltech + GMU27
![Page 28: From Sky to Earth: Data Science Methodology Transferdame.dsf.unina.it/astroinformatics2016/lectures/Mahabal_IAU325.pdf · From Sky to Earth: Data Science Methodology Transfer ...](https://reader031.fdocuments.in/reader031/viewer/2022022511/5ae11f337f8b9a6e5c8e4c7e/html5/thumbnails/28.jpg)
Domain AdaptationWith Jingling Li, Samarth Vaijanapurkar, Brian Bui, …
28
![Page 29: From Sky to Earth: Data Science Methodology Transferdame.dsf.unina.it/astroinformatics2016/lectures/Mahabal_IAU325.pdf · From Sky to Earth: Data Science Methodology Transfer ...](https://reader031.fdocuments.in/reader031/viewer/2022022511/5ae11f337f8b9a6e5c8e4c7e/html5/thumbnails/29.jpg)
Feature Correlations
Sample from Drake et al.
29
![Page 30: From Sky to Earth: Data Science Methodology Transferdame.dsf.unina.it/astroinformatics2016/lectures/Mahabal_IAU325.pdf · From Sky to Earth: Data Science Methodology Transfer ...](https://reader031.fdocuments.in/reader031/viewer/2022022511/5ae11f337f8b9a6e5c8e4c7e/html5/thumbnails/30.jpg)
RF, GFK, CODA, …
• Examine the baseline performance for three combinations of data using random forest blindly:
• Source to Target• Source + Target to Target • Target to Target
• Compare performance with Domain Adaptation• Misclassified objects and outliers
To be used with the various hydrology layers having irregular time series
Aspects to be explored through VIFITalukdar, Mahabal, Djorgovski, Crichton30
![Page 31: From Sky to Earth: Data Science Methodology Transferdame.dsf.unina.it/astroinformatics2016/lectures/Mahabal_IAU325.pdf · From Sky to Earth: Data Science Methodology Transfer ...](https://reader031.fdocuments.in/reader031/viewer/2022022511/5ae11f337f8b9a6e5c8e4c7e/html5/thumbnails/31.jpg)
Earth to Sky Pokemon Go to Transient Go
Binary Transient Brokers combined with AR
CRTS +LSST; Gaia?
SUNY Oswego CS undergrads
31
![Page 32: From Sky to Earth: Data Science Methodology Transferdame.dsf.unina.it/astroinformatics2016/lectures/Mahabal_IAU325.pdf · From Sky to Earth: Data Science Methodology Transfer ...](https://reader031.fdocuments.in/reader031/viewer/2022022511/5ae11f337f8b9a6e5c8e4c7e/html5/thumbnails/32.jpg)
Summary• Many parallels in Astro- and Earth-sciences
• In EarthScience many datasets still analyzed separately
• One big difference: intervention possible
• water distribution
• Citizen Science not explored enough
• monitoring presence of lead at different locations
• Many other use cases being explored32