David Fuchs SPEDDEXES 2014
-
Upload
aceas13tern -
Category
Education
-
view
138 -
download
0
description
Transcript of David Fuchs SPEDDEXES 2014
CliMDDIR
Climate Model Downscaling Data for Impacts Research
David Fuchs, Ian MacadamOEH/NARCliM
The NARCliM data:
~10km grid over Southeast Australia
3 reanalysis-forced simulations:
• 3 different RCMs
• 1950-2009 period
12 GCM-forced simulations:
• 3RCMs forced with 4 different GCMs
• 1990-2009, 2020-2039,2060-2079
• IPCC SRES A2 emissions scenario
Profile of a CliMDDIR User
Scientist
Limited or no knowledge of climate
Own models (needs data)
ASCII/GIS
HDD under the table syndrome
Ecology
Hydrology
Health
Agriculture
Fire & hazard
5 Friendly Customers
Some initial thoughts Big data vs user cap. & bandwidth
Demand vsresources
Arbitrary Transformations
Arbitrary Workflows
Data lifecycleData admin
Data explosion
Some initial thoughts Big data vs user cap. & bandwidth
Demand vsresources
Arbitrary Transformations
Arbitrary Workflows
Data lifecycleData admin
Data explosion
WorkflowEngine
SchedulerOut of core...
Bring process to data
Keep recipes
CliMDDIR Data Cycle at a Glance
1.Collection
2.Request
3.Dataset
4.Simulations
5.Periods
6.Slice and dice
Points/Polygon
7.Variables
8.Transformations
Units
9.Output types
1.Who When What
2.Spatial
3.Software
4.Procedures &
transforms
5.License,
T&C, etc.
1.Publish to RDA
2.URI/DOI
3.SEO
RDA/Google
4.Re-process
Process Audit Reproduce
Front: Nectar Cloud instance
Data/Tape
Celeryd
Engine
Engine
CMS(drupal)
Service
PostGIS
RabbitMQ
OAI-PMH
RDA
Back: AC3/OEH NOC
Mod
elCo
reFl
ows
App
Celery hooks
Site/PointWorkflow
Grid/PolygonWorkflow
WorkflowContext
Transforms (*)Interpolations
(*)CDO Units
(*)
Scratch DBPyTables (HDF)
ResourceURI
Input drivers (*)(i.e. Narclim)
Output drivers (*)(i.e. CSV, GIS, APSIM)
Engine Arch: 1 request = N URIs = 1 thread
Mod
elCo
reFl
ows
App
Celery hooks
Site/PointWorkflow
Grid/PolygonWorkflow
WorkflowContext
Transforms (*)Interpolations
(*)CDO Units
(*)
Scratch DBPyTables (HDF)
ResourceURI
Input drivers (*)(i.e. Narclim)
Output drivers (*)(i.e. CSV, GIS, APSIM)
Engine Arch: 1 request = N URIs = 1 thread
DatasetSimulation
PeriodVariable
AliasTransform
{URI x ScratchDB}
Wish list:
Composition of workflows from building blocks.
Hadoop Hive indexing
(good for multi site multi resource)
Mapreduce
(need more machines for that)
Extended/user/script based transformations
A security nightmare
Final message
Treat it as a startup
Roadmap beyond launch
Market to the end user from the
start (I.e. http://launchsoon.com)
Know your deployment early
(we are still figuring this out!)