GFDL Data Portal
description
Transcript of GFDL Data Portal
11
GFDLGFDL DataData PortalPortalCurrent Status, Achievements and Current Status, Achievements and
Future DevelopmentFuture Development
NOAATECH-2006
K.Dixon, V.Balaji, S.Nikonov GFDL, Princeton
22
Data Portal was launched in 1995 as simple ftp Data Portal was launched in 1995 as simple ftp server.server.
The idea and the term “Data Portal” arose 3 The idea and the term “Data Portal” arose 3 years ago. years ago.
Originally it served data by occasional requests.Originally it served data by occasional requests. Now the main assets are IPCC data. Now the main assets are IPCC data.
HistoryHistory
NOAATECH-2006
33
Common technical characteristicsCommon technical characteristics
SoftwareSoftware
Red Hat LinuxRed Hat Linux Apache Web Server Apache Web Server DODS Aggregation ServerDODS Aggregation Server THREDDSTHREDDS LAS ServerLAS Server GrADS-DODSGrADS-DODS
NOAATECH-2006
44
HardwareHardware
Dell Power Edge 2650 machine
Dual Processor Intel Xeon 2.4 GHz
3 GB RAM
7 Dell Power Vault 220S with 7 Dell Power Vault 220S with 14 HDs in each, 19 TB total14 HDs in each, 19 TB total (expansion pending up to 35 TB) (expansion pending up to 35 TB)
Network bandwidth: internet – 9 Mbit/s internet-2 – 100 Mbit/s
NOAATECH-2006
55
Main GFDL Page Data PortalModel description ….
CM2.0 Model
CM2.1 Model
Ocean Data Assimilation Experiments
LAS Server
Metadata
Metadata
Ocean Simulation
IPCC DATA (http protocol)
IPCC DATA(http protocol)
IPCC DATA (ftp protocol)
IPCC DATA(ftp protocol)
Flexible Modeling SystemDataset
WEB Site StructureWEB Site Structure
NOAATECH-2006
66
Basic MetadataBasic Metadata
Model descriptionModel description Experiment descriptionExperiment description InstitutionInstitution Extra metadata for treating tripolar grids Extra metadata for treating tripolar grids
(including ferret scripts for their(including ferret scripts for their
visualization) visualization) Metadata is compliant with standard CFMetadata is compliant with standard CF Metadata accompanies each data fileMetadata accompanies each data file
NOAATECH-2006
77
Dynamic data presentation chosen by user Dynamic data presentation chosen by user
Spatial/time subsampling with included metadataSpatial/time subsampling with included metadata
Defining on a fly new variables calculated by Defining on a fly new variables calculated by given formulagiven formula
ferret visualizationferret visualization
NOAATECH-2006
Basic features GFDL LAS serverBasic features GFDL LAS server
88
General StatisticsGeneral Statistics 01-Oct-2004 to 01-Oct-200501-Oct-2004 to 01-Oct-2005
Total amount of CM2 Climate Model Data: 12 TBTotal amount of CM2 Climate Model Data: 12 TB More then 10000 NetCDF files, average file size: 1 GB More then 10000 NetCDF files, average file size: 1 GB Successful requests: ~62,000Successful requests: ~62,000 Average successful requests per day: ~200Average successful requests per day: ~200 Distinct files requested: 5,000Distinct files requested: 5,000 Distinct hosts served: ~850Distinct hosts served: ~850 Data transferred: 15 TBData transferred: 15 TB Average data transferred per day: ~42 GBAverage data transferred per day: ~42 GB Number of journal articles submitted that include Number of journal articles submitted that include
analyses of GFDL CM2 model output: > 100analyses of GFDL CM2 model output: > 100
NOAATECH-2006
99
Current standard procedure Current standard procedure of publishing dataof publishing data
Climate Model Output Rewriter (CMOR) processingClimate Model Output Rewriter (CMOR) processing manual configuring for different models, experiments, variablesmanual configuring for different models, experiments, variables triggered manuallytriggered manually
Quality ControlQuality Control made by scientist, includes checking metadata, time ranges, values diapasons, made by scientist, includes checking metadata, time ranges, values diapasons,
etc.etc.
Splitting up CMORized, QC-ed data into small (<2GB) NCDF Splitting up CMORized, QC-ed data into small (<2GB) NCDF files and pushing them out of firewall to Data Portalfiles and pushing them out of firewall to Data Portal
manual configuring scripts doing this manual configuring scripts doing this starting scripts manuallystarting scripts manually
Preparing checksum report on Data PortalPreparing checksum report on Data Portal running cron started scriptrunning cron started script
Configuring Aggregation Server and LASConfiguring Aggregation Server and LAS made manually made manually
NOAATECH-2006
1010
Current Data Portal workflowCurrent Data Portal workflow
NOAATECH-2006
1111
Desirable Features of Data PortalDesirable Features of Data Portal
Relational Database storing metadata with Relational Database storing metadata with description of description of model components and model configurationmodel components and model configuration scenariosscenarios postprocessing (model output and CMOR) postprocessing (model output and CMOR) experimentsexperiments variablesvariables formulized rules of Quality Controlformulized rules of Quality Control data locations in Archivedata locations in Archive task schedulertask scheduler users and groups accountsusers and groups accounts
XML as data exchange formatXML as data exchange format for compliance with FMS Runtime Environment (FRE)for compliance with FMS Runtime Environment (FRE) working format of existing third party softwareworking format of existing third party software good fitted for hierarchical metadata descriptiongood fitted for hierarchical metadata description prevalent in world, easy to exchange with others Data Portalsprevalent in world, easy to exchange with others Data Portals
Publisher Control Center (PCC)Publisher Control Center (PCC) controls CMOR subsystemcontrols CMOR subsystem controls Data Publisher Managercontrols Data Publisher Manager controls data quality (QAC)controls data quality (QAC)
NOAATECH-2006
1212
Desirable Features of Data PortalDesirable Features of Data Portal(continue)(continue)
Climate Model Output Rewriter (CMOR) subsystemClimate Model Output Rewriter (CMOR) subsystem prepares data consistently with specific project requirementsprepares data consistently with specific project requirements
Data Publisher ManagerData Publisher Manager transfers data to target destination in accordance to settings from DBtransfers data to target destination in accordance to settings from DB
Front-end Data Portal Software PackageFront-end Data Portal Software Package Configuration Manager (configures Aggregation Server and Data Portal Configuration Manager (configures Aggregation Server and Data Portal
Interface)Interface) Search Catalog Engine Search Catalog Engine Data Subsampling EngineData Subsampling Engine Data Computation Engine Data Computation Engine Data Visualization Data Visualization Data Delivery ManagerData Delivery Manager
NOAATECH-2006
1313
Proposed functionality schema of ‘GFDL Data Proposed functionality schema of ‘GFDL Data Factory’Factory’
NOAATECH-2006
1414
Standard scenario of functioning Model Data Factory Standard scenario of functioning Model Data Factory (ideal picture)(ideal picture)
Scientist builds model in existing GFDL FMS Runtime Environment System Scientist builds model in existing GFDL FMS Runtime Environment System (FRE) using available model components, datasets and forcing scenario.(FRE) using available model components, datasets and forcing scenario.
FRE puts metadata about built model, scenario, experiment into “curator” FRE puts metadata about built model, scenario, experiment into “curator” DB and runs experiment; DB and runs experiment;
Postprocessing subsystem extracts metadata about postprocessing plan Postprocessing subsystem extracts metadata about postprocessing plan from “curator” DB and executes it, and on finish puts metadata about from “curator” DB and executes it, and on finish puts metadata about processed experiment back into DB.processed experiment back into DB.
Data Publisher (DP) regularly checks “curator” DB for new experiments Data Publisher (DP) regularly checks “curator” DB for new experiments marked as “public” and if finds any invokes CMOR.marked as “public” and if finds any invokes CMOR.
CMOR goes to “curator” DB for metadata and processes needed data CMOR goes to “curator” DB for metadata and processes needed data following metadata instructions.following metadata instructions.
DP calls QAC and then transfers data to Data Portal storage.DP calls QAC and then transfers data to Data Portal storage.
Configuration Manager configures Aggregation Server and Data Portal Configuration Manager configures Aggregation Server and Data Portal Interface and puts records about new public data in “curator” DB.Interface and puts records about new public data in “curator” DB.
End of process, data is ready to go.End of process, data is ready to go.
NOAATECH-2006
1515
Database Compartments:Database Compartments: Model Metadata CompartmentModel Metadata Compartment contains models’ descriptions, allows to build coupled model of needed configurationcontains models’ descriptions, allows to build coupled model of needed configuration
Variables CompartmentVariables Compartment List of all related physical variables List of all related physical variables
Workflow CompartmentWorkflow Compartment contains scenarios, experiments, institutions, projects and users infocontains scenarios, experiments, institutions, projects and users info
Postprocessing CompartmentPostprocessing Compartment defines postprocessing plan for conducting experimentdefines postprocessing plan for conducting experiment
Data Portal CompartmentData Portal Compartment contains info about experiment datacontains info about experiment data
Database ‘Database ‘curatorcurator ’’ designdesign
NOAATECH-2006
1616
Interaction between compartmentsInteraction between compartments
NOAATECH-2006
1818
Model Metadata CompartmentModel Metadata Compartment(in development)(in development)
Coupled_Models
Model_List
Component_Medias
Models
Experiments
Workflow Compartment
Variables
Variables Compartment
NOAATECH-2006
1919
Data Samples from Model CompartmentData Samples from Model Compartment
Components_Medias Coupled_Models
Model_List
Models
NOAATECH-2006
2020
Variables CompartmentVariables Compartment
Projects
Workflow Compartment
Variables Variable_Bundles
Variable_ListsVariable_List_Contents
Proj_Var_Names
NOAATECH-2006
2121
Variable_Lists Variable_List_Contents
Data Sample from Variables CompartmentData Sample from Variables Compartment
Proj_Var_Names Variables
Variable_Bundles
NOAATECH-2006
2222
Workflow Compartment Workflow Compartment Institutions GFDL_USERS
Experiment_Status
Realization
Projects
Experiments
Scenarios
NOAATECH-2006
2323
Data Samples from Workflow CompartmentData Samples from Workflow Compartment
Experiments
Scenarios
NOAATECH-2006
2424
Coupled_Models
Postprocessing CompartmentPP_Units Post_Proc
PP_Content
Data Samples from Postprocessing CompartmentData Samples from Postprocessing Compartment
PP_Units PP_Content
Variable_Lists
ProjectsGFDL_USERS
Average_Periods
NOAATECH-2006
2525
Data Portal CompartmentData Portal Compartment
MissedData_Descriptors
Data_GridsData_Files
Variables
Experiments
Variable_Bundles
Coupled_Models
NOAATECH-2006
2626
Data Samples from Data Portal CompartmentsData Samples from Data Portal Compartments
Data_Files
Data_Grids
MissedData_Descriptors
NOAATECH-2006
2727
Curator DB on Data Portal streamCurator DB on Data Portal stream
Curator DB is already used on GFDL Data Portal.Curator DB is already used on GFDL Data Portal.
JSP technology with servlets on backend was appliedJSP technology with servlets on backend was applied
New data transferred onto Data Portal is automatically New data transferred onto Data Portal is automatically registered in Curator DB with all accompanied metadata.registered in Curator DB with all accompanied metadata.
It turned out the fastest way to search for data on Data It turned out the fastest way to search for data on Data Portal:Portal:
CM2.0CM2.0
CM2.1CM2.1
NOAATECH-2006
2828
Another Aspects of Future DevelopmentAnother Aspects of Future Development
Set up model metadata schema standards in scientific Set up model metadata schema standards in scientific community and develop SQL metadata schema. community and develop SQL metadata schema.
Populate Curator with real metadata extracted from Populate Curator with real metadata extracted from GFDL models.GFDL models.
Conjugate Curator DB with GFDL FMS Modeling SystemConjugate Curator DB with GFDL FMS Modeling System
Customize LAS server to use the Curator DBCustomize LAS server to use the Curator DB
Design user interfaces Design user interfaces
NOAATECH-2006
2929
ENDEND
Questions? Questions?
Thanks!Thanks!
NOAATECH-2006