David Fuchs SPEDDEXES 2014

13
CliMDDIR Climate Model Downscaling Data for Impacts Research David Fuchs, Ian Macadam OEH/NARCliM

description

NARCLIM and the NSW OEH: Challenges and successes of making large datasets available to new users

Transcript of David Fuchs SPEDDEXES 2014

Page 1: David Fuchs SPEDDEXES 2014

CliMDDIR

Climate Model Downscaling Data for Impacts Research

David Fuchs, Ian MacadamOEH/NARCliM

Page 2: David Fuchs SPEDDEXES 2014

The NARCliM data:

~10km grid over Southeast Australia

3 reanalysis-forced simulations:

• 3 different RCMs

• 1950-2009 period

12 GCM-forced simulations:

• 3RCMs forced with 4 different GCMs

• 1990-2009, 2020-2039,2060-2079

• IPCC SRES A2 emissions scenario

Page 3: David Fuchs SPEDDEXES 2014

Profile of a CliMDDIR User

Scientist

Limited or no knowledge of climate

Own models (needs data)

ASCII/GIS

HDD under the table syndrome

Ecology

Hydrology

Health

Agriculture

Fire & hazard

5 Friendly Customers

Page 4: David Fuchs SPEDDEXES 2014

Some initial thoughts Big data vs user cap. & bandwidth

Demand vsresources

Arbitrary Transformations

Arbitrary Workflows

Data lifecycleData admin

Data explosion

Page 5: David Fuchs SPEDDEXES 2014

Some initial thoughts Big data vs user cap. & bandwidth

Demand vsresources

Arbitrary Transformations

Arbitrary Workflows

Data lifecycleData admin

Data explosion

WorkflowEngine

SchedulerOut of core...

Bring process to data

Keep recipes

Page 6: David Fuchs SPEDDEXES 2014

CliMDDIR Data Cycle at a Glance

1.Collection

2.Request

3.Dataset

4.Simulations

5.Periods

6.Slice and dice

Points/Polygon

7.Variables

8.Transformations

Units

9.Output types

1.Who When What

2.Spatial

3.Software

4.Procedures &

transforms

5.License,

T&C, etc.

1.Publish to RDA

2.URI/DOI

3.SEO

RDA/Google

4.Re-process

Process Audit Reproduce

Page 7: David Fuchs SPEDDEXES 2014
Page 8: David Fuchs SPEDDEXES 2014
Page 9: David Fuchs SPEDDEXES 2014

Front: Nectar Cloud instance

Data/Tape

Celeryd

Engine

Engine

CMS(drupal)

Service

PostGIS

RabbitMQ

OAI-PMH

RDA

Back: AC3/OEH NOC

Page 10: David Fuchs SPEDDEXES 2014

Mod

elCo

reFl

ows

App

Celery hooks

Site/PointWorkflow

Grid/PolygonWorkflow

WorkflowContext

Transforms (*)Interpolations

(*)CDO Units

(*)

Scratch DBPyTables (HDF)

ResourceURI

Input drivers (*)(i.e. Narclim)

Output drivers (*)(i.e. CSV, GIS, APSIM)

Engine Arch: 1 request = N URIs = 1 thread

Page 11: David Fuchs SPEDDEXES 2014

Mod

elCo

reFl

ows

App

Celery hooks

Site/PointWorkflow

Grid/PolygonWorkflow

WorkflowContext

Transforms (*)Interpolations

(*)CDO Units

(*)

Scratch DBPyTables (HDF)

ResourceURI

Input drivers (*)(i.e. Narclim)

Output drivers (*)(i.e. CSV, GIS, APSIM)

Engine Arch: 1 request = N URIs = 1 thread

DatasetSimulation

PeriodVariable

AliasTransform

{URI x ScratchDB}

Page 12: David Fuchs SPEDDEXES 2014

Wish list:

Composition of workflows from building blocks.

Hadoop Hive indexing

(good for multi site multi resource)

Mapreduce

(need more machines for that)

Extended/user/script based transformations

A security nightmare

Page 13: David Fuchs SPEDDEXES 2014

Final message

Treat it as a startup

Roadmap beyond launch

Market to the end user from the

start (I.e. http://launchsoon.com)

Know your deployment early

(we are still figuring this out!)