Post on 14-Jan-2016
description
NCAR
CyberinfrastructureCyberinfrastructureforfor
Earth System ModelingEarth System Modeling
Don MiddletonDon Middleton
NCAR Scientific Computing DivisionNCAR Scientific Computing Division
APAN eScience Workshop, HonoluluAPAN eScience Workshop, Honolulu
January 28, 2004January 28, 2004
NCAR
Cyberinfrastructure forCyberinfrastructure forEarth System ModelingEarth System Modeling SupercomputersSupercomputers High-bandwidth networksHigh-bandwidth networks ModelsModels Data centers and GridsData centers and Grids CollaboratoriesCollaboratories Analysis and VisualizationAnalysis and Visualization
NCAR
““Atkins Report”Atkins Report” ““A new age has dawned…”A new age has dawned…”
“The Panel’s overarching recommendation is that the National Science Foundation should establish and lead a large-scale, interagency, and internationally coordinated Advanced Cyberinfrastructure Program (ACP) to create, deploy, and apply cyberinfrastructure in ways that radically empower all scientific and engineering research and allied education. We estimate that sustained new NSF funding of $1 billion per year is needed to achieve critical mass and to leverage the coordinated co-investment from other federal agencies, universities, industry, and international sources necessary to empower a revolution. The cost of not acting quickly or at a subcritical level could be high, both in opportunities lost and in increased fragmentation and balkanization of the research.”
Atkins Report, Executive Summary
NCAR
Characteristics of Infrastructure(from Kim Mish workshop presentation) EssentialEssential
– So important that it becomes ubiquitousSo important that it becomes ubiquitous ReliableReliable
– Example: the built environment of the Roman EmpireExample: the built environment of the Roman Empire ExpensiveExpensive
– Nothing succeeds like excess (e.g. Interstate system)Nothing succeeds like excess (e.g. Interstate system)– Inherently one-off (often, few economies of scale)Inherently one-off (often, few economies of scale)
Clear factorization between research and practiceClear factorization between research and practice– Generally deploy what provably worksGenerally deploy what provably works
NCAR
A Global Coupled Climate A Global Coupled Climate ModelModel
QuickTime™ and aYUV420 codec decompressor
are needed to see this picture.
NCAR
Climate Model Data ProductionClimate Model Data Production T42 CCSM (current, 280km)T42 CCSM (current, 280km)
– 7.5GB/yr, 100 years -> .75TB7.5GB/yr, 100 years -> .75TB T85 CCSM (140km)T85 CCSM (140km)
– 29GB/yr, 100 years -> 2.9TB29GB/yr, 100 years -> 2.9TB T170 CCSM (70km)T170 CCSM (70km)
– 110GB/yr, 100 years -> 11TB110GB/yr, 100 years -> 11TB
NCAR
Capacity-related ImprovementsCapacity-related ImprovementsIncreased turnaround, model development, ensemble of runs
Increase by a factor of 10, linear data
Current T42 CCSMCurrent T42 CCSM– 7.5GB/yr, 100 years -> .75TB * 10 = 7.5GB/yr, 100 years -> .75TB * 10 =
7.5TB7.5TB
NCAR
CCM at T170 ResolutionCCM at T170 Resolution
QuickTime™ and aYUV420 codec decompressor
are needed to see this picture.
NCAR
Capability-related Improvements Capability-related Improvements Spatial Resolution: T42 -> T85 -> T170
Increase by factor of ~ 10-20, linear data Temporal Resolution: Study diurnal cycle, 3 hour data
Increase by factor of ~ 4, linear data
CCM3 at T170 (70km)
NCAR
Capability-related Improvements Capability-related Improvements
Quality: Improved boundary layer, clouds, convection, ocean physics, land model, river runoff, sea ice
Increase by another factor of 2-3, data flat
Scope: Atmospheric chemistry (sulfates, ozone…), biogeochemistry (carbon cycle, ecosystem dynamics),middle Atmosphere Model…
Increase by another factor of 10+, linear data
NCAR
Model Improvement WishlistModel Improvement Wishlist
Grand Total:
Increase compute by a Factor O(1000-10000)
NCAR
Advances at the Earth SimulatorAdvances at the Earth Simulator
ESC Climate Model at T1279 (approx. 10km)
NCAR
Longer-term MissionsLonger-term Missions - - Observation of Key Earth System InteractionsObservation of Key Earth System Interactions
Terra
Aura
Aqua
Landsat 7
Exploratory - Exploratory - Explore Specific Earth System Processes and Parameters and Explore Specific Earth System Processes and Parameters and Demonstrate TechnologiesDemonstrate Technologies
GRACE
PICASSO
Cloudsat
QuikScat
EO-1
ICEsat Jason-1
SRTMVCL
We Will Examine Practically Every Aspect of the Earth We Will Examine Practically Every Aspect of the Earth System from Space in This DecadeSystem from Space in This Decade
Triana
Courtesy of Tim Killeen, NCAR
NCAR
The Earth System GridThe Earth System Grid
U.S. DOE SciDAC funded R&D effort - a U.S. DOE SciDAC funded R&D effort - a ““Collaboratory Pilot Project”Collaboratory Pilot Project”
Build an “Earth System Grid” that enables Build an “Earth System Grid” that enables management, discovery, distributed access, management, discovery, distributed access, processing, & analysis of distributed terascale processing, & analysis of distributed terascale climate research dataclimate research data
Build upon Globus ToolkitBuild upon Globus Toolkit and DataGrid and DataGrid technologies and technologies and deploydeploy
Potential broad application to other areasPotential broad application to other areas
http://www.earthsystemgrid.org
NCAR
ESG TeamESG Team ANLANL
– Ian Foster (PI)Ian Foster (PI)– Veronika NefedovaVeronika Nefedova– (John Bresenhan)(John Bresenhan)– (Bill Allcock)(Bill Allcock)
LBNLLBNL– Arie ShoshaniArie Shoshani– Alex SimAlex Sim
ORNLORNL– David BernholdteDavid Bernholdte– Kasidit ChanchioKasidit Chanchio– Line PouchardLine Pouchard
LLNL/PCMDILLNL/PCMDI– Bob DrachBob Drach– Dean Williams (PI)Dean Williams (PI)
USC/ISIUSC/ISI– Anne ChervenakAnne Chervenak– Carl KesselmanCarl Kesselman– (Laura Perlman)(Laura Perlman)
NCARNCAR– David BrownDavid Brown– Luca CinquiniLuca Cinquini– Peter FoxPeter Fox– Jose GarciaJose Garcia– Don Middleton (PI)Don Middleton (PI)– Gary StrandGary Strand
NCAR
NCAR
ESG ScenarioESG Scenario End 2002: 1.2 million files comprising End 2002: 1.2 million files comprising
~75TB of data at NCAR, ORNL, LANL, ~75TB of data at NCAR, ORNL, LANL, NERSC, and PCMDINERSC, and PCMDI
End 2007: As much as 3 PB (3,000 TB) End 2007: As much as 3 PB (3,000 TB) of data (!)of data (!)
Current practice is already broken – the Current practice is already broken – the future will be even worse if something future will be even worse if something isn’t done…isn’t done…
NCAR
ESG: ChallengesESG: Challenges Enabling the simulation and data Enabling the simulation and data
management teammanagement team Enabling the core research community Enabling the core research community
in analyzing and visualizing resultsin analyzing and visualizing results Enabling broad multidisciplinary Enabling broad multidisciplinary
communities to access simulation communities to access simulation resultsresultsWe need integrated scientific work environments that enable
smooth WORKFLOW for knowledge development: computation, collaboration & collaboratories, data management, access, distribution, analysis, and visualization.
NCAR
ESG: StrategiesESG: Strategies Harness a federation of sites, web portalsHarness a federation of sites, web portals
– Globus Toolkit -> The Earth System Grid -> The Globus Toolkit -> The Earth System Grid -> The UltraDataGridUltraDataGrid
Move data a minimal amount, keep it close to Move data a minimal amount, keep it close to computational point of origin when possiblecomputational point of origin when possible– Data access protocols, distributed analysisData access protocols, distributed analysis
When we must move data, do it fast and with When we must move data, do it fast and with a minimum amount of human interventiona minimum amount of human intervention– Storage Resource Management, fast networksStorage Resource Management, fast networks
Keep track of what we have, particularly Keep track of what we have, particularly what’s on deep storagewhat’s on deep storage– Metadata and Replica CatalogsMetadata and Replica Catalogs
NCAR
NCAR
Server
Tera/Peta-scaleArchive
HRM
Tools for reliable staging,
transport, and replication
Server
Tera/Peta-scaleArchive
HRM
ClientSelectionControl
MonitoringHRM
Storage/Data Management
NCAR
OPeNDAPOPeNDAP
An Open Source Project for a An Open Source Project for a Network Data Access ProtocolNetwork Data Access Protocol
(originally DODS, the Distributed (originally DODS, the Distributed Oceanographic Data System)Oceanographic Data System)
NCAR
OPeNDAP-g-Transparency-Performance-Security-Authorization-(Processing)Typical Application
Data(local)
netCDF lib
Application
Data(remote)
OPeNDAP Client
Application
OPeNDAPViahttp
Big Data(remote)
ESG client
Application
ESG+
DODS
OpenDAP Server ESG Server
Distributed Application
data
Distributed Data Access Services
OPeNDAPViaGrid
NCAR
For XML encoding of metadata (and data) of any generic netCDF For XML encoding of metadata (and data) of any generic netCDF filefile
Objects: netCDF, dimension, variable, attributeObjects: netCDF, dimension, variable, attribute Beta version reference implementation as Java Library Beta version reference implementation as Java Library
(http://www.scd.ucar.edu/vets/luca/netcdf/extract_metadata.htm)(http://www.scd.ucar.edu/vets/luca/netcdf/extract_metadata.htm)
ESG: NcML Core SchemaESG: NcML Core Schema
netCDFnetCDF
nc:netCDFType
nc:dimension
nc:variable
nc: attribute
nc:attribute
nc:values
nc:VariableType
NCAR
Object[1] id
Object[1] id
Activity[0,1] name[0,1] description[0,1] rights[0,n] date type=[0,n] note[0,n] participant role=[0,n] reference uri=
Activity[0,1] name[0,1] description[0,1] rights[0,n] date type=[0,n] note[0,n] participant role=[0,n] reference uri=
isA
Investigation
Investigation
isA
Project[0,n] topic type=[0,1] funding
Project[0,n] topic type=[0,1] funding
isA Ensemble
Ensemble
Campaign
Campaign
isPartOf
Simulation[0,n] simulationInput type=[0,n] simulationHardware
Simulation[0,n] simulationInput type=[0,n] simulationHardware
Observation
Observation
Experiment
Experiment
Analysis
Analysis
isPartOf
hasParent
hasChild
hasSibling
Dataset[0,1] type[0,1] conventions[0,n] date type=[0,n] format type= uri=[0,1] timeCoverage[0,1] spaceCoverage
Dataset[0,1] type[0,1] conventions[0,n] date type=[0,n] format type= uri=[0,1] timeCoverage[0,1] spaceCoverage
isA
generatedBy
isPartOf
Person[0,1] firstName[0,1] lastName[0,1] contact
Person[0,1] firstName[0,1] lastName[0,1] contact
Institution[0,1] name[0,1] type[0,1] contact
Institution[0,1] name[0,1] type[0,1] contact
isAworksF
or
participant role=
Class
Class
AbstractClass
AbstractClass
inheritanceassociation
LEGEND
Service[0,1] name[0,1] description
Service[0,1] name[0,1] description
serviceId
NCAR
ESG Current TopologyESG Current Topology
RLI
MSSHRM
HPSS HRM
RLI
HPSSHRM
RLI
DISKHRM
RLI
DISK
OGSA-DAIMySQLRDBMS
ESG WEB PORTALTomcat/Struts
cross-updatecross-update
gridFTP
gridFTP
gridFTP
query
queryMyProxy
authenticate
GRAMGATEKEEPER
submit
execute
gridFTP SERVER
gridFTP SERVER
gridFTP SERVER
gridFTP SERVER
LAS SERVERvisualize
LBNL
ISI
LLNL
NCAR
ORNL
CAS
ANL
LRC
LRC
LRC
LRC
NCAR
Data->KnowledgeData->Knowledge
Mass StorageSystem (1.3PB) Petascale Knowledge
Repository
Establish new paradigms for managing and accessingscientific data based on semantic organization.
NCAR
Collaborations & RelationshipsCollaborations & Relationships CCSM Data Management GroupCCSM Data Management Group The Globus ProjectThe Globus Project Other SciDAC Projects: Climate, Security & Policy for Other SciDAC Projects: Climate, Security & Policy for
Group Collaboration, Scientific Data Management ISIC, & Group Collaboration, Scientific Data Management ISIC, & High-performance DataGrid ToolkitHigh-performance DataGrid Toolkit
OPeNDAP/DODS (multi-agency)OPeNDAP/DODS (multi-agency) NSF National Science Digital Libraries Program (UCAR & NSF National Science Digital Libraries Program (UCAR &
Unidata THREDDS Project)Unidata THREDDS Project) U.K. e-Science and British Atmospheric Data CenterU.K. e-Science and British Atmospheric Data Center NOAA NOMADS and CEOS-gridNOAA NOMADS and CEOS-grid Earth Science Portal group (multi-agency, intnl.)Earth Science Portal group (multi-agency, intnl.) ESMF (emerging)ESMF (emerging)
NCAR
NCAR Command Language NCAR Command Language (NCL)(NCL)
NCAR
NCAR
NCAR
NCAR
NCAR
NCAR
NCAR
NCL: CoreNCL: Core Approx. 500 built-in functions and proceduresApprox. 500 built-in functions and procedures
– File I/O & data model for Earth sciencesFile I/O & data model for Earth sciences– Unique grids, Climate-modeling routinesUnique grids, Climate-modeling routines– Spherical harmonics, Regridding and Spherical harmonics, Regridding and
interpolationinterpolation– Graphics (wind barbs, simple 3D plots)Graphics (wind barbs, simple 3D plots)
36 NCL core visual representations36 NCL core visual representations– Contours, XY plots, vectors, streamlines, Contours, XY plots, vectors, streamlines,
maps, histograms, text, markers, polygonsmaps, histograms, text, markers, polygons Supported on Unix, Linux, Mac, and PCSupported on Unix, Linux, Mac, and PC10 years, 20 People involved with
development, 50 person-years of effort, about 1.5 million lines of source, 500K lines of documentation
NCAR
NCL as CI for a CommunityNCL as CI for a Community CAM & CCSM Processor – 100 functions, 200 CAM & CCSM Processor – 100 functions, 200
examples, 20K lines of NCL code (CGD)examples, 20K lines of NCL code (CGD) WGNE Climate Diagnostics Processor – 10K WGNE Climate Diagnostics Processor – 10K
lines of NCL code (CGD) lines of NCL code (CGD) Award-winning Aviation Weather Site (RAP)Award-winning Aviation Weather Site (RAP) MM5 Analysis Package (RIP)MM5 Analysis Package (RIP) Weather Research & Forecast Model: Initial Weather Research & Forecast Model: Initial
community analysis software and RIPcommunity analysis software and RIP Community Data Portal (SCD)Community Data Portal (SCD)
NCAR
NCLNCL
http://ngwww.ucar.edu/nclhttp://ngwww.ucar.edu/ncl
NCAR
Collaborative Environments and the Collaborative Environments and the AccessGridAccessGrid
Science Portals + AccessGrid:University of Michigan (Knoop, Hardin)Vegetation & Ecosystem Mapping Program
(VEMAP)NCAR/SCD VETS/KEGArgonne National Labs
NCAR
ENDEND