Ben Evans SPEDDEXES 2014
-
Upload
aceas13tern -
Category
Education
-
view
135 -
download
0
description
Transcript of Ben Evans SPEDDEXES 2014
SPEDDEXES (Spa*ally Explicit Data Discovery, Extrac*on and Evalua*on Services)
Ben Evans
Assoc. Dir. Research Engagement and Ini*a*ves
Community Pla-orm for Earth Systems
Geophys (10 TB)
Weather (2PB)
BOM GA CSIRO ANU
Marine (10 TB)
Lidar (80 TB)
Inter-‐naJonal
Other NaJonal
CMIP5 Impact
Astronomy (Optical) 550TB
Landsat (1 PB) Water
Ocean 1.5+PB
Atmos (2PB)
Community Pla-orm for Earth Systems
Broader'PerspecFve'
AIMS
CSIRO MAR
Geoscience Australia
BOM
Dept. of Defence AAD
Aust. Ocean Data Centre Joint Facility
(AODCJF)
Data Integration • eMII • MACDDAP
Data Generation • ARGO • SOOP • SOTS • ANFOG • AUV • ANMN • AATAMS • FAIMMS • SRS
NCRIS IMOS
Australian Ocean Data Network
Port
als
and
Acc
ess
Data Management Components • ANDS • NCI • RDSI Other Components • AAF • AARNet
Data Mangement
Australian Research Data Commons
VIC
WA GA
TAS
NT QLD
Govt Geoscience Info. Committee
(GGIC)
SA
NSW
• Data Integration • AuScope Grid • SISS • ARSDC
Data Generation • VCL • Geospatiall • SAM • Earth Imaging • Earth Composition • Groundwater
NCRIS AuScope
AuScope Portal
Geoscience Portal
Res
earc
h &
Dev
elop
men
t G
over
nmen
t Ope
ratio
nal ANZLIC Spatial
Information Council
Australian Spatial Data Directory
VIC
WA
OSDM
TAS
NT QLD
SA
NSW
ACT
NZ
ICSM
Data Integration • Atlas of Living Australia • Aust Phenomics Network
Data Generation Aust. Plant Phenomics Facility
NCRIS Integrated Biological Systems
Atlas of Living Australia
Australian Govt Water
VIC
WA
BOM
TAS NT
QLD
SA
NSW ACT
CSIRO
Aust Water Resources Information System
• Australian Spatial Consortium
• ASIBA • SSI • PSMA • 43 Pty Ltd
CRC for Spatial Information
NCRIS TERN
• e-MAST
• BCCVL
TERN. Climate & Weather
NCRIS CWSLab
• ACCESS
• CABLE
Australian Government
AGIMO Gov 2.0
CSSDP NAMF
NSS AGLS MDBC NWC
Aust. Govt. Online Service Point
GA
NZ
NT QLD NSW
VIC
WA ACT
TAS SA
CSIRO
Bureau of Met
Earth Systems Major Data CollecJons
Climate & Weather Other Related
CMIP5 Landsat
CORDEX MODIS
OFES AVHRR
ACCESS VIIRS
Major Reanalysis collec*ons
Bathymetry (GA)
Observa*ons (BoM) Digital Eleva*on (GA)
Ocean-‐Marine (BoM) AusCover
Seasonal Climate (In prep: Soil Moisture and Proper*es, …)
YOTC -‐ CoECSS
OFES -‐ CoECSS Currently working through access and management arrangements
Data Management Plan around data collecJons:
Governance Roles & Responsibili*es
Licensing
Repor*ng Versioning Data descrip*ons
Release workflow Provenance Cura*on
Opera*ons and Services
User Consulta*on Communica*ons
Capacity management
Sustainability Documenta*on
Support … …
All the necessary work so that users can work with confidence and clarity
NCI : DOIs, Registries, National and International linkage to be discoverable
ISO dependencies • gf -‐ Feature ISO 19109 • cv -‐ Coverage (fields) ISO 19123 • md -‐ Metadata ISO 19115 • gm -‐ Geometry ISO 19107 • tm -‐ Temporal ISO 19108 • basic -‐ Datatypes ISO 19103
ISO 19100 framework being used for geospaJal
Data Catalogue and Registry service
• Linkage to ANDS and RDA (Research Data Australia)
• Other Na*onal linkages: – NEII -‐ Na#onal Environmental Informa*on Infrastructure
– Data.gov
• Domain Specific Registries – Astronomy IVOA – Interna*onal Astronomy Virtual Observatory
– Climate – Earth Systems Grid
Digital Object IdenJfiers • ANDS and Community min*ng
• Linked to release of data
IntegraJon with Data Services for the CollecJon
• Eg Specialised Environment that the community has built
• data movement service, portal interface, and NCI services
Virtual Laboratories
• Provide a community driven approach to building advanced High Performance/Data Intensive infrastructure.
• Leverage/Integrate large na*onal and interna*onal investments in large scale community projects (Soeware Stacks)
• Easier to incorporate broader community and Interna*onal linkages. • Integrates the NCI services and data collec*ons
• Incorporates methods and paferns needed for scien*fic rigor. – Provenance, Versioning, Traceability, Discovery – standards, s-‐stacks – Move soeware package download to a Services/VL model.
• Largely shies complex IT issues while maintaining opera*onal management: – Hardware, common environments and basic systems
– Dev/Ops frameworks
Model accessibility – Climate and Weather Virtual Lab
10
Collaboration: Bureau of Meteorology, CSIRO, NCI
CWSLab – a “standard” pafern
Model Configuration Database Initial- isation
Input datasets
System Config
Model Config
Data Management, Provenance, Publishing International Federation
Analysis Interactive Analysis
UV-CDAT Analysis/Viz
Scalable Data Analysis
Model Run Environment (accessdev.nci.org.au) • Build • Submit • Record Results
Cloud services Data analy*cs
Services
Data library
Hos*ng
Interfaces
Visualisa*on Management Tools
CWSLab: Work Package 3 (NCI)
Workflow Provenance Rich Analysis
Environment
Earth system model
Coupler
Carbon
Terrestrial
Ocean and sea-ice
Atmospheric chemistry
Ocean and sea-ice
Carbon cycle (ACCESS-ESM1) • Terrestrial – CABLE • Ocean – Matear et al. • Couple to modified ACCESS1.3 • Technical coupling essentially
complete • Multi-century trial simulations during
13/14
Atmospheric chemistry • UKCA • Collaboration (U. Melbourne) • Couple to ACCESS-CM2 • Trial simulations (coupled) during 14/15
Improving Data assimilaJon for Ecology Analysis
Recently Coupled NCAR DART to CABLE 2.0
NCI HPC Cloud
High Performance Data • High IOPS • Throughput
Scien*fic Environments • virtual labs • New services not previously possible • Suppor*ng DevOps
Ongoing development • increase performance • soeware & hardware vendor collabora*on
Per-Tenant public IP assignments (CIDR boundaries)
FDR
IB
FDR
IB
FDR
IB
FDR
IB
FDR
IB
FDR
IB
OpenStack private IP (flat network*) - quota managed
NFS
Lustre
NFS
SSD SSD SSD SSD SSD SSD
FDR
IB
FDR
IB
NFS
Lustre
NFS
FDR
IB
FDR
IB
FDR
IB
FDR
IB
FDR
IB
FDR
IB
FDR
IB
FDR
IB
FDR
IB
FDR
IB
Supercomputer (raijin)
Specialised environments Data Services FD
R IB
FD
R IB
D
ata
trans
fer
lnet routers
Storage Services -‐ /g/data
Filesystem: /g/data (NCI Global Data)
Type Lustre v2.3.11 parallel distributed filesystem
Purpose High Performance Filesystem available across all NCI systems
Capacity 6.3 PB /g/data1 3.1 PB /g/data2
Throughput Max 60GB/sec parallel access Avg 500MB/sec individual file
ConnecJvity 56 Gbit Infiniband (Raijin compute nodes) 10 Gbit Ethernet (NCI datamovers, /g/data NFS servers)
Access Protocols Na*ve Lustre mount (Raijin compute nodes) SFTP/SCP/Rsync-‐ssh (NCI datamover nodes), NFSv3
Backup & Recovery Lustre HSM, DMF backed with dual site tape copy (Q2/Q3 2014)
Best suited for Ac*ve projects requiring high performance storage, accessible across mul*ple NCI systems (eg Raijin, Cloud)
Not suitable for Intense high I/O applica*ons such as scratch for HPC jobs. Deep archival or infrequently accessed datasets (cold data)
ApplicaJon Development Environment -‐ Well-‐managed processes to meet security standards -‐ Managed releases using community co-‐management model
Fast Access to data at NCI -‐ Fast internal access to NCI storage -‐ Big network pipes to major networks and partners
InteracJve Analysis -‐ Augmented environment to exis*ng experience -‐ Direct access -‐ Specialised Environments and services not previously offered
New Scalable methods and techniques -‐ Geospa*al+temporal “datacube” technologies -‐ Fast map-‐reduce methods
SupporJng Dev+Ops of new Science environments
SISS Stack – Spa*al Informa*on Services Stack
Developed by CSIRO using open standards and software – adopted/deployed widely in Australia
Vistrails
Components of build for community 1
Basic OS funcJons
Common Modules
Bespoke Services
Special config choices
So_ware Stack
NCI Identity management
NCI NF mounts
Compiler X
Library Y sudo
Xserver/VNC
TDS
Remote batch submit
firewall mgmt
Analytics packages
GeoNet
6xTDS
Firewall ports X Nagios server
Licenseserver
GeoServer
Vocab Service
Vis tools
Components of build for community 2
Basic OS funcJons
Common Modules
Bespoke Services
Special config choices
So_ware Stack
NCI Identity management
Compiler X
Library Y
Xserver/VNC
TDS
firewall mgmt
6xGridFTP
Firewall ports Y Nagios config2
Licenseserver
P2P
NCI NF mounts
sudo
Gridftp
Prov
Search Engine
Analytics packages
Vis tools
Components of build for community 2
Basic OS funcJons
Common Modules
Bespoke Services
Special config choices
So_ware Stack
NCI Identity management
Compiler X
Library Y
Xserver/VNC
TDS
firewall mgmt
6xGridFTP
Firewall ports Y Nagios config2
Licenseserver
P2P
NCI NF mounts
sudo
Gridftp
Prov
Search Engine
Not-changed
Analytics packages
Vis tools
Components of build for community 3
Basic OS funcJons
Common Modules
Bespoke Services
Special config choices
Super So_ware Stack
NCI Stack 1 NCI Env Stack
WorkflowX
Analytics Stack
2xStack1
Modify Stack1 Modify Stack 2 P2P
Vis Stack
Gridftp
Take Stacks from Upstream And use as Bundles
NCI Core Bundles
Community1 repo Community2 repo
Virtual Laboratory Operational Bundle
DevOps approach to building and operating environments
NCI Basic Bundle • NCI ldap, moun*ng filesystems, firewalls, basic sysadmin packges • basic desktop: Xserver, VNC
NCI “peak system” bundle • package replica*on and module environment (inc compilers etc) • Remote job submission
• Direct login access using NCI account, interactive use • VNC access (client, Chrome VNC client, …) • Lots of NCI packages (from Raijin) but more GUIs as well • New tools easily added
Matlab* IDL* iPython
Spyder (IDE) Ferret NCO
NCL CDO pyngl
Intel Compilers & libraries
NetCDF HDF
PGPlot Octave GDAL
Vistrails
Analysis Bundle
* Bring your own license