Ben Evans SPEDDEXES 2014

29
SPEDDEXES (Spa*ally Explicit Data Discovery, Extrac*on and Evalua*on Services) Ben Evans Assoc. Dir. Research Engagement and Ini*a*ves

description

Australian National Computing Infrastructure: Providing smarter solutions and services to enhance the way we do ecosystem science

Transcript of Ben Evans SPEDDEXES 2014

Page 1: Ben Evans SPEDDEXES 2014

SPEDDEXES  (Spa*ally  Explicit  Data  Discovery,  Extrac*on  and  Evalua*on  Services)              

Ben  Evans  

Assoc.  Dir.  Research  Engagement  and  Ini*a*ves  

Page 2: Ben Evans SPEDDEXES 2014

Community  Pla-orm  for  Earth  Systems  

Page 3: Ben Evans SPEDDEXES 2014

Geophys    (10  TB)  

Weather  (2PB)  

BOM   GA   CSIRO   ANU  

Marine  (10  TB)  

Lidar  (80  TB)  

Inter-­‐naJonal  

Other  NaJonal  

CMIP5 Impact

Astronomy (Optical) 550TB

Landsat    (1  PB)   Water

Ocean 1.5+PB

Atmos (2PB)

Community  Pla-orm  for  Earth  Systems  

Page 4: Ben Evans SPEDDEXES 2014

Broader'PerspecFve'

AIMS

CSIRO MAR

Geoscience Australia

BOM

Dept. of Defence AAD

Aust. Ocean Data Centre Joint Facility

(AODCJF)

Data Integration • eMII • MACDDAP

Data Generation • ARGO • SOOP • SOTS • ANFOG • AUV • ANMN • AATAMS • FAIMMS • SRS

NCRIS IMOS

Australian Ocean Data Network

Port

als

and

Acc

ess

Data Management Components • ANDS • NCI • RDSI Other Components • AAF • AARNet

Data Mangement

Australian Research Data Commons

VIC

WA GA

TAS

NT QLD

Govt Geoscience Info. Committee

(GGIC)

SA

NSW

• Data Integration •  AuScope Grid • SISS • ARSDC

Data Generation • VCL • Geospatiall • SAM • Earth Imaging • Earth Composition • Groundwater

NCRIS AuScope

AuScope Portal

Geoscience Portal

Res

earc

h &

Dev

elop

men

t G

over

nmen

t Ope

ratio

nal ANZLIC Spatial

Information Council

Australian Spatial Data Directory

VIC

WA

OSDM

TAS

NT QLD

SA

NSW

ACT

NZ

ICSM

Data Integration • Atlas of Living Australia • Aust Phenomics Network

Data Generation Aust. Plant Phenomics Facility

NCRIS Integrated Biological Systems

Atlas of Living Australia

Australian Govt Water

VIC

WA

BOM

TAS NT

QLD

SA

NSW ACT

CSIRO

Aust Water Resources Information System

• Australian Spatial Consortium

• ASIBA • SSI • PSMA • 43 Pty Ltd

CRC for Spatial Information

NCRIS TERN

• e-MAST

• BCCVL

TERN. Climate & Weather

NCRIS CWSLab

•  ACCESS

•  CABLE

Australian Government

AGIMO Gov 2.0

CSSDP NAMF

NSS AGLS MDBC NWC

Aust. Govt. Online Service Point

GA

NZ

NT QLD NSW

VIC

WA ACT

TAS SA

CSIRO

Bureau of Met

Page 5: Ben Evans SPEDDEXES 2014

Earth  Systems  Major  Data  CollecJons  

Climate  &  Weather   Other  Related  

CMIP5   Landsat  

CORDEX   MODIS  

OFES   AVHRR  

ACCESS   VIIRS  

Major  Reanalysis  collec*ons  

Bathymetry  (GA)  

Observa*ons  (BoM)   Digital  Eleva*on  (GA)  

Ocean-­‐Marine  (BoM)   AusCover  

Seasonal  Climate   (In  prep:  Soil  Moisture  and  Proper*es,  …)  

YOTC  -­‐  CoECSS  

OFES  -­‐  CoECSS  Currently working through access and management arrangements

Page 6: Ben Evans SPEDDEXES 2014

Data  Management  Plan  around  data  collecJons:  

Governance   Roles  &  Responsibili*es  

Licensing  

Repor*ng   Versioning   Data  descrip*ons  

Release  workflow   Provenance   Cura*on  

Opera*ons  and  Services  

User  Consulta*on   Communica*ons  

Capacity  management  

Sustainability   Documenta*on  

Support   …   …  

All the necessary work so that users can work with confidence and clarity

NCI : DOIs, Registries, National and International linkage to be discoverable

Page 7: Ben Evans SPEDDEXES 2014

ISO  dependencies    •  gf  -­‐  Feature      ISO  19109  •  cv  -­‐  Coverage  (fields)  ISO  19123  • md  -­‐  Metadata  ISO  19115  •  gm  -­‐  Geometry  ISO  19107  •  tm  -­‐  Temporal  ISO  19108  •  basic  -­‐  Datatypes    ISO  19103  

ISO  19100  framework  being  used  for  geospaJal  

Page 8: Ben Evans SPEDDEXES 2014

Data  Catalogue  and  Registry  service  

•  Linkage  to  ANDS  and  RDA  (Research  Data  Australia)  

•  Other  Na*onal  linkages:    –  NEII  -­‐  Na#onal  Environmental  Informa*on  Infrastructure  

–  Data.gov  

•  Domain  Specific  Registries  –  Astronomy  IVOA  –  Interna*onal  Astronomy  Virtual  Observatory  

–  Climate  –  Earth  Systems  Grid  

Digital  Object  IdenJfiers  •  ANDS  and  Community  min*ng  

•  Linked  to  release  of  data  

IntegraJon  with  Data  Services  for  the  CollecJon  

•  Eg  Specialised  Environment  that  the  community  has  built  

•  data  movement  service,  portal  interface,  and  NCI  services  

Page 9: Ben Evans SPEDDEXES 2014

Virtual  Laboratories  

•  Provide  a  community  driven  approach  to  building  advanced  High  Performance/Data  Intensive  infrastructure.  

•  Leverage/Integrate  large  na*onal  and  interna*onal  investments  in  large  scale  community  projects  (Soeware  Stacks)  

•  Easier  to  incorporate  broader  community  and  Interna*onal  linkages.  •  Integrates  the  NCI  services  and  data  collec*ons  

•  Incorporates  methods  and  paferns  needed  for  scien*fic  rigor.  –  Provenance,  Versioning,  Traceability,  Discovery  –  standards,  s-­‐stacks    –  Move  soeware  package  download  to  a  Services/VL  model.  

•  Largely  shies  complex  IT  issues  while  maintaining  opera*onal  management:  –  Hardware,  common  environments  and  basic  systems    

–  Dev/Ops  frameworks  

Page 10: Ben Evans SPEDDEXES 2014

Model  accessibility  –  Climate  and  Weather  Virtual  Lab  

10

Collaboration: Bureau of Meteorology, CSIRO, NCI

Page 11: Ben Evans SPEDDEXES 2014

CWSLab  –  a  “standard”  pafern  

Model Configuration Database Initial- isation

Input datasets

System Config

Model Config

Data Management, Provenance, Publishing International Federation

Analysis Interactive Analysis

UV-CDAT Analysis/Viz

Scalable Data Analysis

Model Run Environment (accessdev.nci.org.au) •  Build •  Submit •  Record Results

Page 12: Ben Evans SPEDDEXES 2014

Cloud  services  Data  analy*cs  

Services  

Data  library  

Hos*ng  

Interfaces  

Visualisa*on   Management  Tools  

CWSLab:  Work  Package  3  (NCI)  

Workflow    Provenance  Rich  Analysis  

Environment  

Page 13: Ben Evans SPEDDEXES 2014

Earth system model

Coupler

Carbon

Terrestrial

Ocean and sea-ice

Atmospheric chemistry

Ocean and sea-ice

Carbon cycle (ACCESS-ESM1) •  Terrestrial – CABLE •  Ocean – Matear et al. •  Couple to modified ACCESS1.3 •  Technical coupling essentially

complete •  Multi-century trial simulations during

13/14

Atmospheric chemistry •  UKCA •  Collaboration (U. Melbourne) •  Couple to ACCESS-CM2 •  Trial simulations (coupled) during 14/15

Page 14: Ben Evans SPEDDEXES 2014

Improving  Data  assimilaJon  for  Ecology  Analysis  

Recently Coupled NCAR DART to CABLE 2.0

Page 15: Ben Evans SPEDDEXES 2014

NCI  HPC  Cloud  

High  Performance  Data  •   High  IOPS  •   Throughput  

Scien*fic  Environments  •   virtual  labs  •   New  services  not  previously          possible  •   Suppor*ng  DevOps  

Ongoing  development    •   increase  performance  •   soeware  &  hardware  vendor  collabora*on  

Page 16: Ben Evans SPEDDEXES 2014

Per-Tenant public IP assignments (CIDR boundaries)

FDR

IB

FDR

IB

FDR

IB

FDR

IB

FDR

IB

FDR

IB

OpenStack private IP (flat network*) - quota managed

NFS

Lustre

NFS

SSD SSD SSD SSD SSD SSD

Page 17: Ben Evans SPEDDEXES 2014

FDR

IB

FDR

IB

NFS

Lustre

NFS

FDR

IB

FDR

IB

FDR

IB

FDR

IB

FDR

IB

FDR

IB

FDR

IB

FDR

IB

FDR

IB

FDR

IB

Supercomputer (raijin)

Specialised environments Data Services FD

R IB

FD

R IB

D

ata

trans

fer

lnet routers

Page 18: Ben Evans SPEDDEXES 2014

Storage  Services  -­‐  /g/data  

Filesystem:   /g/data  (NCI  Global  Data)  

Type   Lustre  v2.3.11  parallel  distributed  filesystem  

Purpose   High  Performance  Filesystem  available  across  all  NCI  systems  

Capacity   6.3  PB  /g/data1  3.1  PB  /g/data2  

Throughput   Max  60GB/sec  parallel  access  Avg    500MB/sec  individual  file    

ConnecJvity   56  Gbit  Infiniband  (Raijin  compute  nodes)  10  Gbit  Ethernet  (NCI  datamovers,  /g/data  NFS  servers)  

Access  Protocols   Na*ve  Lustre  mount  (Raijin  compute  nodes)  SFTP/SCP/Rsync-­‐ssh  (NCI  datamover  nodes),  NFSv3  

Backup  &  Recovery   Lustre  HSM,  DMF  backed  with  dual  site  tape  copy  (Q2/Q3  2014)  

Best  suited  for   Ac*ve  projects  requiring  high  performance  storage,  accessible  across  mul*ple  NCI  systems  (eg  Raijin,  Cloud)  

Not  suitable  for   Intense  high  I/O  applica*ons  such  as  scratch  for  HPC  jobs.    Deep  archival  or  infrequently  accessed  datasets  (cold  data)  

Page 19: Ben Evans SPEDDEXES 2014

ApplicaJon  Development  Environment  -­‐   Well-­‐managed  processes  to  meet  security  standards  -­‐   Managed  releases  using  community  co-­‐management  model  

Fast  Access  to  data  at  NCI  -­‐   Fast  internal  access  to  NCI  storage  -­‐   Big  network  pipes  to  major  networks  and  partners  

InteracJve  Analysis  -­‐   Augmented  environment  to  exis*ng  experience  -­‐   Direct  access  -­‐   Specialised  Environments  and  services  not  previously  offered  

New  Scalable  methods  and  techniques  -­‐   Geospa*al+temporal  “datacube”  technologies    -­‐   Fast  map-­‐reduce  methods  

SupporJng  Dev+Ops  of  new  Science  environments  

Page 20: Ben Evans SPEDDEXES 2014

SISS  Stack  –  Spa*al  Informa*on  Services  Stack  

Developed by CSIRO using open standards and software – adopted/deployed widely in Australia

Page 21: Ben Evans SPEDDEXES 2014

Vistrails  

Page 22: Ben Evans SPEDDEXES 2014

Components  of  build  for  community  1  

Basic  OS  funcJons  

Common  Modules  

Bespoke  Services  

Special  config  choices  

So_ware  Stack  

NCI Identity management

NCI NF mounts

Compiler X

Library Y sudo

Xserver/VNC

TDS

Remote batch submit

firewall mgmt

Analytics packages

GeoNet

6xTDS

Firewall ports X Nagios server

Licenseserver

GeoServer

Vocab Service

Vis tools

Page 23: Ben Evans SPEDDEXES 2014

Components  of  build  for  community  2  

Basic  OS  funcJons  

Common  Modules  

Bespoke  Services  

Special  config  choices  

So_ware  Stack  

NCI Identity management

Compiler X

Library Y

Xserver/VNC

TDS

firewall mgmt

6xGridFTP

Firewall ports Y Nagios config2

Licenseserver

P2P

NCI NF mounts

sudo

Gridftp

Prov

Search Engine

Analytics packages

Vis tools

Page 24: Ben Evans SPEDDEXES 2014

Components  of  build  for  community  2  

Basic  OS  funcJons  

Common  Modules  

Bespoke  Services  

Special  config  choices  

So_ware  Stack  

NCI Identity management

Compiler X

Library Y

Xserver/VNC

TDS

firewall mgmt

6xGridFTP

Firewall ports Y Nagios config2

Licenseserver

P2P

NCI NF mounts

sudo

Gridftp

Prov

Search Engine

Not-changed

Analytics packages

Vis tools

Page 25: Ben Evans SPEDDEXES 2014

Components  of  build  for  community  3  

Basic  OS  funcJons  

Common  Modules  

Bespoke  Services  

Special  config  choices  

Super  So_ware  Stack  

NCI Stack 1 NCI Env Stack

WorkflowX

Analytics Stack

2xStack1

Modify Stack1 Modify Stack 2 P2P

Vis Stack

Gridftp

Take Stacks from Upstream And use as Bundles

Page 26: Ben Evans SPEDDEXES 2014

NCI Core Bundles

Community1 repo Community2 repo

Virtual Laboratory Operational Bundle

DevOps approach to building and operating environments

Page 27: Ben Evans SPEDDEXES 2014

NCI  Basic  Bundle  •   NCI  ldap,  moun*ng  filesystems,  firewalls,  basic  sysadmin  packges  •   basic  desktop:  Xserver,  VNC  

NCI  “peak  system”  bundle  •   package  replica*on  and  module  environment  (inc  compilers  etc)  •   Remote  job  submission  

Page 28: Ben Evans SPEDDEXES 2014

•  Direct login access using NCI account, interactive use •   VNC access (client, Chrome VNC client, …) •  Lots of NCI packages (from Raijin) but more GUIs as well •  New tools easily added

Matlab*   IDL*   iPython  

Spyder  (IDE)   Ferret   NCO  

NCL   CDO   pyngl  

Intel  Compilers  &  libraries  

NetCDF   HDF  

PGPlot   Octave   GDAL  

Vistrails  

Analysis  Bundle  

* Bring your own license