EARTHCUBE CONCEPTUAL DESIGN A Scalable … · A Scalable Community Driven Architecture ... High...

38
EARTHCUBE CONCEPTUAL DESIGN A Scalable Community Driven Architecture http://earthcube.org/group/scalable-community-driven-architecture Overview PI: G. Djorgovski (Caltech) CO-I: D. Pilone, T. Pilone (Element 84), D. Crichton, E. Law (JPL) Other key personnel: S. Caltagirone (E84), S. Hughes (JPL), T. Huang (JPL), A. Mahabal (Caltech) 1/7/16 1 2016 ESIP Winter Meeting

Transcript of EARTHCUBE CONCEPTUAL DESIGN A Scalable … · A Scalable Community Driven Architecture ... High...

Page 1: EARTHCUBE CONCEPTUAL DESIGN A Scalable … · A Scalable Community Driven Architecture ... High Performance Computing and Storage Resources ! ... Query Subset etc. Data Node Analytic

EARTHCUBE CONCEPTUAL DESIGN

A Scalable Community Driven Architecture http://earthcube.org/group/scalable-community-driven-architecture

Overview PI: G. Djorgovski (Caltech)

CO-I: D. Pilone, T. Pilone (Element 84), D. Crichton, E. Law (JPL)

Other key personnel: S. Caltagirone (E84), S. Hughes (JPL),

T. Huang (JPL), A. Mahabal (Caltech)

1/7/16 1 2016 ESIP Winter Meeting

Page 2: EARTHCUBE CONCEPTUAL DESIGN A Scalable … · A Scalable Community Driven Architecture ... High Performance Computing and Storage Resources ! ... Query Subset etc. Data Node Analytic

A high level system blueprint for the definition, construction, and deployment of both existing and new components to ensure that they can be unified and integrated into an evolutionary national infrastructure for EarthCube

1/7/16 2

Page 3: EARTHCUBE CONCEPTUAL DESIGN A Scalable … · A Scalable Community Driven Architecture ... High Performance Computing and Storage Resources ! ... Query Subset etc. Data Node Analytic

Methodology

!  Identification of stakeholders, concerns and requirements

!  Identification of architectural use cases and drivers

!  Selection of an architectural framework

!  Development of the architectural principles

!  Development of the architectural models

!  Capture of the architecture artifacts in a consolidated report

!  Generation of recommendations for adopting the architecture for the EarthCube program

1/7/16 3

Page 4: EARTHCUBE CONCEPTUAL DESIGN A Scalable … · A Scalable Community Driven Architecture ... High Performance Computing and Storage Resources ! ... Query Subset etc. Data Node Analytic

1/7/16 4

Page 5: EARTHCUBE CONCEPTUAL DESIGN A Scalable … · A Scalable Community Driven Architecture ... High Performance Computing and Storage Resources ! ... Query Subset etc. Data Node Analytic

Stakeholders Stakeholder/Actor Concerns

NSFProgramManagersMakedecisionandprovideguidanceattheEarthCubeprogramlevel.

Providesuf>icientfundingtosupporttheEarthCubemission.

EarthCubeScientistsUseEarthCuberesourcesandservicestoconductscienti>icresearch.

Publishscienti>icresults&curatedataasneeded.

EarthCubeDevelopers DeveloptechnologiesandservicesthatcanbeintegratedintoEarthCube.

EarthCubeArchitects

EstablishEarthCuberequirements,frameworkandoperationalconcept.

Developinformationmodel(vocabulary,ontology).Establishstandardsguidelines.EnsureinteroperabilitybetweenEarthCubeBuildingBlocks.

ExternalDataUsers UseEarthCuberesourcesandservicesforresearch,education,anddecision-making.

Curator EnsuredataisproperlycapturedinEarthCubecompliantdatarepositories.

DataOwner Responsibleforproducingthedata.Concernedaboutitsdistributionanduse.

ExternalDataFacility Responsibleforarchivingdataatotheragencies(NASA,NOAA,USGS,etc);interoperabilitywiththeEarthCubeCyberinfrastructure.

EarthCubeGovernanceCommittees

Responsibleforgeneratingandmonitoringthegovernanceforthesystemincludingdatacuration,access,usecasepriority,interoperabilitystandards,etc.

EarthCubeOf>iceStaff ResponsibleformaintainingthecommunityinvolvementwithinEarthCubeandcommunicatingchangesandhowtousethesystem.

1/7/16 5

Page 6: EARTHCUBE CONCEPTUAL DESIGN A Scalable … · A Scalable Community Driven Architecture ... High Performance Computing and Storage Resources ! ... Query Subset etc. Data Node Analytic

Use Cases !  Big Science – Discovery, Comparison, Provenance, Model & visualization

!  Collaborative Science

!  Dark Data Contribution

!  Tools Contribution

!  Data Documentation

!  Models Sharing

!  High Performance Computing and Storage Resources

!  Real Time Data

!  Physical Sample Curation

1/7/16 6

Page 7: EARTHCUBE CONCEPTUAL DESIGN A Scalable … · A Scalable Community Driven Architecture ... High Performance Computing and Storage Resources ! ... Query Subset etc. Data Node Analytic

Drivers !  Transform and accelerate research and discovery by turning data

into knowledge and enabling interdisciplinary data integration.

!  Provide critically needed data, tools, and computational resources and frameworks for cross-domain scientific collaboration, analysis and with long-term geoscience software and data preservation, discovery and use.

!  Provide a geosicences cyberinfrastructure and architecture that is scalable, extensible and sustainable.

1/7/16 7

Page 8: EARTHCUBE CONCEPTUAL DESIGN A Scalable … · A Scalable Community Driven Architecture ... High Performance Computing and Storage Resources ! ... Query Subset etc. Data Node Analytic

Frameworks !  Zachman Framework -  For organizing stakeholder concerns and

perspectives.

!  ISO/IEC/IEEE 42010:2011-  For architectural description guidelines.

!  Reference Model for Open Distributed Processing (RM-ODP) – For architectural patterns for distributed systems.

!  Open Group Architecture Framework (TOGAF) – For managing the architecture.

!  Federal Enterprise Architecture Framework (FEAF) – For classifying the architecture into architectural elements and viewpoints.

!  ISO 14721:2003 - Open Archival Information System (OAIS) Reference Model - Provides a standard for information objects.

!  ISO/IEC 11179:3 Registry Metamodel and Basic Attributes specification - Provides a schema for a metadata registry.

1/7/16 8

Page 9: EARTHCUBE CONCEPTUAL DESIGN A Scalable … · A Scalable Community Driven Architecture ... High Performance Computing and Storage Resources ! ... Query Subset etc. Data Node Analytic

!  Scalability

!  Community Driven

!  Open Science

!  Interoperability

!  Sustainability

!  Distributed

!  Data Model Driven

1/7/16 9

Page 10: EARTHCUBE CONCEPTUAL DESIGN A Scalable … · A Scalable Community Driven Architecture ... High Performance Computing and Storage Resources ! ... Query Subset etc. Data Node Analytic

ScienceDataManage

SatelliteInstrumentDataSystems

ScienceDataManageAirborne

Data

ScienceDataManageAgency

EarthDataArchives

Data Provider

EarthCubeCI

EarthCube Discovery

1/7/16 10

Page 11: EARTHCUBE CONCEPTUAL DESIGN A Scalable … · A Scalable Community Driven Architecture ... High Performance Computing and Storage Resources ! ... Query Subset etc. Data Node Analytic

ScienceDataManage

SatelliteInstrumentDataSystems

ScienceDataManageAirborne

Data

ScienceDataManageAgency

EarthDataArchives

Data Provider

EarthCubeCI

OtherDataSystems(e.g.NOAA)OtherDataSystems(e.g.NOAA)OtherDataSystems(In-Situ,University)

EarthCube Repository EarthCube Discovery

1/7/16 11

Page 12: EARTHCUBE CONCEPTUAL DESIGN A Scalable … · A Scalable Community Driven Architecture ... High Performance Computing and Storage Resources ! ... Query Subset etc. Data Node Analytic

ScienceDataManage

SatelliteInstrumentDataSystems

ScienceDataManageAirborne

Data

ScienceDataManageAgency

EarthDataArchives

Data Provider

EarthCubeCI

OtherDataSystems(e.g.NOAA)OtherDataSystems(e.g.NOAA)OtherDataSystems(In-Situ,University)

EarthCube Repository

Data Science Infrastructure (Data, Algorithms, Machines)

ScienceTeams

EarthCube Discovery

1/7/16 12

Page 13: EARTHCUBE CONCEPTUAL DESIGN A Scalable … · A Scalable Community Driven Architecture ... High Performance Computing and Storage Resources ! ... Query Subset etc. Data Node Analytic

Applica>ons

DecisionSupport

ScienceDataManage

SatelliteInstrumentDataSystems

ScienceDataManageAirborne

Data

ScienceDataManageAgency

EarthDataArchives

Research

Data ProviderData Analysis

EarthCubeCI

OtherDataSystems(e.g.NOAA)OtherDataSystems(e.g.NOAA)OtherDataSystems(In-Situ,University)

EarthCube Repository

Data Science Infrastructure (Data, Algorithms, Machines)

Earthcube Data Analytics Centers

ScienceTeams

EarthCube Discovery

1/7/16 13

Page 14: EARTHCUBE CONCEPTUAL DESIGN A Scalable … · A Scalable Community Driven Architecture ... High Performance Computing and Storage Resources ! ... Query Subset etc. Data Node Analytic

Benchmark

!  Earth System Grid Federation (ESGF)

!  Early Detection Research Network (EDRN)

!  NASA’s Earth Observing System Data and Information System (EOSDIS)

ExArch'Mee*ng,'October'2012

Node2Architecture

•Internally,'each'ESGF'Node'is'composed'of'services'and'applica*ons'that'collec*vely'enable'data'and'metadata'access,'and'user'management.'•ESGF'soNware'stack'combines'custom'soNware'components'developed'by'ESGF'with'other'freely'available'applica*ons'from'eCommerce'(Apache'Tomcat,'Solr,'Postgres,...)'and'geoIinforma*cs'(Thredds'Data'Server,'LAS,'...)•SoNware'components'are'grouped'into'4'areas'of'func*onality'(aka'“flavors”):

•Data'Node':'secure'data'publica*on'and'access•Index'Node':'‣metadata'indexing'and'searching‣web'portal'UI'to'drive'human'interac*on‣dashboard'suite'of'admin'applica*ons‣model'metadata'viewer'plugin

•'Iden*ty'Provider':'user'authen*ca*on'and'group'membership•'Compute'Node':'analysis'and'visualiza*on

•Nodes'flavors'can'be'installed'in'various'combina*ons'depending'on'site'needs,'or'to'achieve'higher'performance'and'scalability

ExArch'Mee*ng,'October'2012

SoGware2Stack2:2Node2Manager

•Enables'con*nuos'exchange'of'service'and'state'informa*on'among'Nodes

•Internally,'it'collects'Node'health'informa*on'and'metrics'(cpu,'disk'usage,'etc.)

•Installed'for'all'Node'flavorsPeerIToIPeer'(P2P)'protocol

•Gossip'protocol:'informa*on'is'exchanged'randomly'among'peers

‣Each'Node'receives'informa*on'from'one'Node,'merges'it'with'its'own'informa*on,'and'

propagates'it'to'two'other'Nodes'at'random

‣No'central'coordina*on,'no'single'point'of'failure•Nodes'can'join/leave'the'federa*on'dynamically

•Each'Node'is'bootstrapped'with'knowledge'of'one'default'peer•Each'Node'can'belong'to'one'or'more'peer'groups'within'which'informa*on'is'exchanged

XML'Registry

•XML'document'that'is'payload'of'P2P'protocol

•Contains'service'endpoints'and'SSL'public'keys'for'all'Nodes'in'the'federa*on

•Derived'products'(list'of'search'shards,'trusted'IdPs,'loca*on'of'Airibute'Services,...)'are'used'by'federa*onIwide'services

Challenge:'good'news'travel'fast,'bad'news'travel'slow...

ASF DAAC SAR Products Sea Ice, Polar

Processes

SEDAC Human Interactions

in Global Change LP DAAC

Land Processes & Features

PO.DAAC Ocean Circulation

Air-Sea Interactions ASDC

Radiation Budget, Clouds, Aerosols, Tropo Chemistry

ORNL DAAC Biogeochemical

Dynamics, EOS Land Validation

GES DISC Atmos Composition &

Dynamics, Global Modeling, Hydrology,

Radiance

LAADS/ MODAPS

Atmosphere

OBPG Ocean Biology & Biogeochemistry

GHRC Hydrological Cycle &

Severe Weather

CDDIS Crustal Dynamics

Solid Earth NCAR, U of Col. HIRDLS, MOPITT,

SORCE GSFC

GLAS, MODIS, OMI, OBPG

LaRC CERES, SAGE III

GHRC AMSR-E, LIS,

AMSR2

JPL MLS, TES

San Diego ACRIM

NSIDC DAAC Cryosphere, Polar

Processes

SIPSs

Key Data

Center

ECS Sites

1/7/16 14

Page 15: EARTHCUBE CONCEPTUAL DESIGN A Scalable … · A Scalable Community Driven Architecture ... High Performance Computing and Storage Resources ! ... Query Subset etc. Data Node Analytic

ProcessArchitecture

EarthCubeSystem

Architecture

DataLifecycle

Data Generation

Data Curation

DataTransport

Data Ingest

DataManagement

SearchDistribution

DataAnalytics

Visualization

SoftwareLifecycle Administrative

TechnologyPlanning

SoftwareDevelopment

Release

Governance

Standards

Technology

Policies

ResourcePlanning

DataArchitecture

TechnologyArchitecture

Ingest (Receive, Validate, Accept)

Catalog/DataManagement

Storage(Repository)

Processing

Search and Discovery

DataIntegration

DataAnalysis

Distribution

Visualization

InformationModel

ArchiveModel

Query/Access

DataFormats

ArchiveOrganization

Grammar

DataDictionary

DistributedArchitecture

Data Access

IT Security

Collaboration

Publication

DomainCrosscutting Research Software Lifecycle

Software Development

Software Versioning

Software Archiving

Software Search &

Distribution

Algorithm Storage & Discovery

Data Standards Evaluation

User Roles, Support and Feedback

Use metrics for data, software and site use

Architecture Elements

1/7/16 15

Page 16: EARTHCUBE CONCEPTUAL DESIGN A Scalable … · A Scalable Community Driven Architecture ... High Performance Computing and Storage Resources ! ... Query Subset etc. Data Node Analytic

Data Lifecycle Data$Genera)on$

Data$Cura)on$and$Prepara)on$

Data$Transport$

Data$$Ingest$

Data$Management$

Discovery,$Access$&$Distribu)on$

Data$Analy)cs$

Visualiza)on$

Prepare&data&for&use&and&submission&into&EarthCube&

Original&genera7on&of&data&(from&sensors,&inves7gators,&etc)&

Maximize&informa7on&throughput&against&available&bandwidth&

Provides&overall&data&management&services&for&the&data&in&EarthCube&&

Provides&a&plaAorm&for&integra7ng&analy7cs&with&rendering&and&understanding&the&data&

Supports&the&capture&and&valida7on&of&data&into&EarthCube&

Enables&the&analysis&of&massive,&distributed&heterogeneous&data&

Enables&discovery,&access&and&distribu7on&of&the&data&

1/7/16 16

Page 17: EARTHCUBE CONCEPTUAL DESIGN A Scalable … · A Scalable Community Driven Architecture ... High Performance Computing and Storage Resources ! ... Query Subset etc. Data Node Analytic

Information Model Context

1/7/16 17

Page 18: EARTHCUBE CONCEPTUAL DESIGN A Scalable … · A Scalable Community Driven Architecture ... High Performance Computing and Storage Resources ! ... Query Subset etc. Data Node Analytic

Framework

Sources

Images

Measurements/Observations

RemoteSensing

Text file/ASCII

Spread-sheets

Metadata

etc.

Data Ingest

Data Management

AbstractionJavaPythonRubyGroovyScala…

Data Analysis

Science Workflow

Analytics

MachineLearning

PatternRecognition

Climatologies

Data Reduction

UncertaintyAnalysis

etc.

Visualization

OGC (WMS,WMTS, …)

TWMS

Data Slices

Plots andCoordination

IntegratedViews

Data Distribution

Query/Retrieval

Data Viewer and Interactive

Query

Data Science Framework

Analysis Platform

Search

Metadata Publication

Data Push

Data Access

OpenSearchLuceneSolrElasticSearch

RDBMS⁃ Postgres⁃ Oracle⁃ MySQL

NoSQL⁃ MongoDB⁃ Cassandra

Array⁃ SciDB

Storage⁃ SAN⁃ S3⁃ SSD

Hadoop/HDFS⁃ MapReduce⁃ ZooKeeper⁃ Spark

Graph DB⁃ TitanDB⁃ Neo4J

Triple Store⁃ Virtuoso⁃ AllegroGraph⁃ Sesame⁃ Fuseki

Message Passing Interface

SingleMachine

High Performance Computing

GPU

Data Providers Applied Science

OPeNDAP

W10N

LAS

THREDDS

Data StewardshipCuration

Virtual Machine

Container

InformationData Knowledge

Lucene

OpenSearch

SPARQ

etc.

Transfer

Validation

Metadata

Harvesting

Packaging

Search

Query

Subset

etc.

DataNode

AnalyticNode

1/7/16 18

Page 19: EARTHCUBE CONCEPTUAL DESIGN A Scalable … · A Scalable Community Driven Architecture ... High Performance Computing and Storage Resources ! ... Query Subset etc. Data Node Analytic

Example Instantiation

Research

Applications

EarthCube Cyberinfrstructure

Applied Science

SatelliteInformation

Data Systems

AirborneData

AgencyEarth Data Archives

Research

Applications

Decision Support

OtherData Systems

(In-Situ, University)

Data Provider

EarthCubeData Science Infrastructure

EarthCubeData Analytics Centers

EarthCubeDiscipline-Specific

Data Management withData Analytic

Node

EarthCubeData Management

Node

EarthCubeData Management

Node

Data AnalyticNode

EarthCubeRepository

EarthCubeRepository

Sources

Images

Measurements/Observations

RemoteSensing

Text file/ASCII

Spread-sheets

Metadata

etc.

Data Ingest API

Data Management

AbstractionJavaPythonRubyGroovyScala…

Data Distribution

Data Science Framework

Search

Metadata Publication

Data Push

Data Access

OpenSearchLuceneSolrElasticSearch

RDBMS⁃ Postgres⁃ Oracle⁃ MySQL

NoSQL⁃ MongoDB⁃ Cassandra

Array⁃ SciDB

Storage⁃ SAN⁃ S3⁃ SSD

OPeNDAP

W10NTHREDDS

Data StewardshipCuration

Transfer

Validation

Metadata

Harvesting

Packaging

Sources

Images

Measurements/Observations

RemoteSensing

Text file/ASCII

Spread-sheets

Metadata

etc.

Data Ingest API

Data Management

AbstractionJavaPythonRubyGroovyScala…

Data Distribution

Data Science Framework

Search

Metadata Publication

Data Push

Data Access

OpenSearchLuceneSolrElasticSearch

RDBMS⁃ Postgres⁃ Oracle⁃ MySQL

NoSQL⁃ MongoDB⁃ Cassandra

Array⁃ SciDB

Storage⁃ SAN⁃ S3⁃ SSD

OPeNDAP

W10NTHREDDS

Data StewardshipCuration

Transfer

Validation

Metadata

Harvesting

Packaging

Sources

Images

Measurements/Observations

RemoteSensing

Text file/ASCII

Spread-sheets

Metadata

etc.

Data Ingest API

Data Management

AbstractionJavaPythonRubyGroovyScala…

Data Distribution

Data Science Framework

Search

Metadata Publication

Data Push

Data Access

OpenSearchLuceneSolrElasticSearch

RDBMS⁃ Postgres⁃ Oracle⁃ MySQL

NoSQL⁃ MongoDB⁃ Cassandra

Array⁃ SciDB

Storage⁃ SAN⁃ S3⁃ SSD

OPeNDAP

W10NTHREDDS

Data StewardshipCuration

Transfer

Validation

Metadata

Harvesting

Packaging

EarthCubeRepository

EarthCubeData Management

Node

Data AnalyticNode

Data AnalyticNode

EarthCubeData Management

Node

Sources

Images

Measurements/Observations

RemoteSensing

Text file/ASCII

Spread-sheets

Metadata

etc.

Data Ingest API

Data Management

AbstractionJavaPythonRubyGroovyScala…

Data Analysis

Science Workflow

Analytics

MachineLearning

PatternRecognition

Climatologies

Data Reduction

UncertaintyAnalysis

etc.

Visualization

OGC (WMS,WMTS, …)

TWMS

Data Slices

Plots andCoordination

IntegratedViews

Data Distribution

Query/Retrieval

API

Data Viewer and Interactive Query API

Data Science Framework

Analysis Platform

Search

Metadata Publication

Data Push

Data Access

OpenSearchLuceneSolrElasticSearch

RDBMS⁃ Postgres⁃ Oracle⁃ MySQL

NoSQL⁃ MongoDB⁃ Cassandra

Array⁃ SciDB

Storage⁃ SAN⁃ S3⁃ SSD

Hadoop/HDFS⁃ MapReduce⁃ ZooKeeper⁃ Spark

Graph DB⁃ TitanDB⁃ Neo4J

Triple Store⁃ Virtuoso⁃ AllegroGraph⁃ Sesame⁃ Fuseki

Message Passing Interface

SingleMachine

High Performance Computing

GPU

OPeNDAP

W10N

LAS

THREDDS

Data StewardshipCuration

Virtual Machine

Container

Lucene

OpenSearch

SPARQ

etc.

Transfer

Validation

Metadata

Harvesting

Packaging

Search

Query

Subset

etc.

Data Analysis

Science Workflow

Analytics

MachineLearning

PatternRecognition

Climatologies

Data Reduction

UncertaintyAnalysis

etc.

Visualization

OGC (WMS,WMTS, …)

TWMS

Data Slices

Plots andCoordination

IntegratedViews

Query/Retrieval

API

Data Viewer and Interactive Query API

Data Science Framework

Analysis Platform

Hadoop/HDFS⁃ MapReduce⁃ ZooKeeper⁃ Spark

Graph DB⁃ TitanDB⁃ Neo4J

Triple Store⁃ Virtuoso⁃ AllegroGraph⁃ Sesame⁃ Fuseki

Message Passing Interface

SingleMachine

High Performance Computing

GPU

LAS

Virtual Machine

Container

Lucene

OpenSearch

SPARQ

etc.

Search

Query

Subset

etc.

Data Analysis

Science Workflow

Analytics

MachineLearning

PatternRecognition

Climatologies

Data Reduction

UncertaintyAnalysis

etc.

Visualization

OGC (WMS,WMTS, …)

TWMS

Data Slices

Plots andCoordination

IntegratedViews

Query/Retrieval

API

Data Viewer and Interactive Query API

Data Science Framework

Analysis Platform

Hadoop/HDFS⁃ MapReduce⁃ ZooKeeper⁃ Spark

Graph DB⁃ TitanDB⁃ Neo4J

Triple Store⁃ Virtuoso⁃ AllegroGraph⁃ Sesame⁃ Fuseki

Message Passing Interface

SingleMachine

High Performance Computing

GPU

LAS

Virtual Machine

Container

Lucene

OpenSearch

SPARQ

etc.

Search

Query

Subset

etc.

Data Analysis

Science Workflow

Analytics

MachineLearning

PatternRecognition

Climatologies

Data Reduction

UncertaintyAnalysis

etc.

Visualization

OGC (WMS,WMTS, …)

TWMS

Data Slices

Plots andCoordination

IntegratedViews

Query/Retrieval

API

Data Viewer and Interactive Query API

Data Science Framework

Analysis Platform

Hadoop/HDFS⁃ MapReduce⁃ ZooKeeper⁃ Spark

Graph DB⁃ TitanDB⁃ Neo4J

Triple Store⁃ Virtuoso⁃ AllegroGraph⁃ Sesame⁃ Fuseki

Message Passing Interface

SingleMachine

High Performance Computing

GPU

LAS

Virtual Machine

Container

Lucene

OpenSearch

SPARQ

etc.

Search

Query

Subset

etc.

1/7/16 19

Page 20: EARTHCUBE CONCEPTUAL DESIGN A Scalable … · A Scalable Community Driven Architecture ... High Performance Computing and Storage Resources ! ... Query Subset etc. Data Node Analytic

Thank You

Questions?

1/7/16 20

Page 21: EARTHCUBE CONCEPTUAL DESIGN A Scalable … · A Scalable Community Driven Architecture ... High Performance Computing and Storage Resources ! ... Query Subset etc. Data Node Analytic

EarthCube Conceptual Architecture Discussion

The controversial bits…

1/7/16 21

Page 22: EARTHCUBE CONCEPTUAL DESIGN A Scalable … · A Scalable Community Driven Architecture ... High Performance Computing and Storage Resources ! ... Query Subset etc. Data Node Analytic

THIS IS A DISCUSSION.

Please Talk.

1/7/16 22

Page 23: EARTHCUBE CONCEPTUAL DESIGN A Scalable … · A Scalable Community Driven Architecture ... High Performance Computing and Storage Resources ! ... Query Subset etc. Data Node Analytic

EarthCube Architect

EarthCube Developer

EarthCube Scientist

Curator

1/7/16 23

NSF Program Manager

External Data Users

External Data

Facility

Earthcube Staff

Governance Committee

Page 24: EARTHCUBE CONCEPTUAL DESIGN A Scalable … · A Scalable Community Driven Architecture ... High Performance Computing and Storage Resources ! ... Query Subset etc. Data Node Analytic

Stakeholders !  Do we have the right stakeholders?

!  Do they overlap at all? Too much?

!  Are they useful to provide use cases and personas that help drive the system?

!  Are we missing key stakeholders?

1/7/16 24

Page 25: EARTHCUBE CONCEPTUAL DESIGN A Scalable … · A Scalable Community Driven Architecture ... High Performance Computing and Storage Resources ! ... Query Subset etc. Data Node Analytic

Stakeholders

NSF Program Managers EarthCube Scientists

EarthCube Developers EarthCube Architects

External Data Users Curators

Data Owner External Data Facility

EarthCube Governance Committees

EarthCube Office Staff

1/7/16 25

Page 26: EARTHCUBE CONCEPTUAL DESIGN A Scalable … · A Scalable Community Driven Architecture ... High Performance Computing and Storage Resources ! ... Query Subset etc. Data Node Analytic

Architectural Principles

Federation Sustainability

Standards (Data) Model-Driven

Extensibility Scalability

Provenance Security

1/7/16 26

Page 27: EARTHCUBE CONCEPTUAL DESIGN A Scalable … · A Scalable Community Driven Architecture ... High Performance Computing and Storage Resources ! ... Query Subset etc. Data Node Analytic

Standards… ! We do not advocate a particular standard…

!  Our Conceptual Architecture emphasizes fully defined and self contained data rather than prescribing standard(s).

!  EarthCube’s heterogenous data, applications, and systems appear to justify possible increase in complexity.

!  Common models and representations should be used.

1/7/16 27

Page 28: EARTHCUBE CONCEPTUAL DESIGN A Scalable … · A Scalable Community Driven Architecture ... High Performance Computing and Storage Resources ! ... Query Subset etc. Data Node Analytic

EarthCube Software Lifecycle Processes

1/7/16 28

Technology Planning

Software Development

Release

Page 29: EARTHCUBE CONCEPTUAL DESIGN A Scalable … · A Scalable Community Driven Architecture ... High Performance Computing and Storage Resources ! ... Query Subset etc. Data Node Analytic

Research Software Lifecycle Processes

1/7/16 29

Technology Planning

Software Development

Software Versioning

Software Search and Distribution

Algorithm Search and Distribution

Page 30: EARTHCUBE CONCEPTUAL DESIGN A Scalable … · A Scalable Community Driven Architecture ... High Performance Computing and Storage Resources ! ... Query Subset etc. Data Node Analytic

Software Lifecycle Processes ! We place an emphasis on software versioning,

discovery, etc. for Research Software. Should we treat “EarthCube proper” processes the same way?

! What about discovery and distribution?

1/7/16 30

Page 31: EARTHCUBE CONCEPTUAL DESIGN A Scalable … · A Scalable Community Driven Architecture ... High Performance Computing and Storage Resources ! ... Query Subset etc. Data Node Analytic

Metrics ! Use Examples:

!  Product Searches

!  Products Downloaded

!  Services Accessed

!  Publications Cited

!  Quality Examples: !  Ingestion speed

!  Search Response Time

!  User “conversions”

1/7/16 31

Page 32: EARTHCUBE CONCEPTUAL DESIGN A Scalable … · A Scalable Community Driven Architecture ... High Performance Computing and Storage Resources ! ... Query Subset etc. Data Node Analytic

Metrics & Conceptual Architectures

!  Is this the right place to advise / mandate metrics? (e.g. we’re not doing this for standards)

!  Should we be specific or just provide categories?

!  Do we go so far as to ”mandate” it for EarthCube components / building blocks / etc?

1/7/16 32

Page 33: EARTHCUBE CONCEPTUAL DESIGN A Scalable … · A Scalable Community Driven Architecture ... High Performance Computing and Storage Resources ! ... Query Subset etc. Data Node Analytic

Applica>ons

DecisionSupport

ScienceDataManage

SatelliteInstrumentDataSystems

ScienceDataManageAirborne

Data

ScienceDataManageAgency

EarthDataArchives

Research

Data ProviderData Analysis

EarthCubeCI

OtherDataSystems(e.g.NOAA)OtherDataSystems(e.g.NOAA)OtherDataSystems(In-Situ,University)

EarthCube Repository

Data Science Infrastructure (Data, Algorithms, Machines)

Earthcube Data Analytics Centers

ScienceTeams

EarthCube Discovery

1/7/16 33

Page 34: EARTHCUBE CONCEPTUAL DESIGN A Scalable … · A Scalable Community Driven Architecture ... High Performance Computing and Storage Resources ! ... Query Subset etc. Data Node Analytic

Places we haven’t expressed an opinion

!  Cloud vs. on-premises hosting

!  Data location (hosted vs. distributed)

!  Compute location

Should we?

1/7/16 34

Page 35: EARTHCUBE CONCEPTUAL DESIGN A Scalable … · A Scalable Community Driven Architecture ... High Performance Computing and Storage Resources ! ... Query Subset etc. Data Node Analytic

Best Practices Common Software Stack

Common Data Model

Standard Interfaces Service-Oriented Architecture

Decoupled Storage, Compute, and Data Management

Federated Search

Analytic Services Visualization

1/7/16 35

Page 36: EARTHCUBE CONCEPTUAL DESIGN A Scalable … · A Scalable Community Driven Architecture ... High Performance Computing and Storage Resources ! ... Query Subset etc. Data Node Analytic

Misc Questions ! How do we make this real?

! What’s the next thing you need to make EarthCube more valuable to you?

! How can the Conceptual Architecture effort help you get there?

1/7/16 36

Page 37: EARTHCUBE CONCEPTUAL DESIGN A Scalable … · A Scalable Community Driven Architecture ... High Performance Computing and Storage Resources ! ... Query Subset etc. Data Node Analytic

Our Next Steps 1.  Solicit Reviewers for Conceptual Architecture

Document (NOW!)

2.  Incorporate feedback and review comments

3.  Write actionable recommendations and incorporate into final Conceptual Architecture

4.  Prioritize and Deploy Key Architectural Components

1/7/16 37

Page 38: EARTHCUBE CONCEPTUAL DESIGN A Scalable … · A Scalable Community Driven Architecture ... High Performance Computing and Storage Resources ! ... Query Subset etc. Data Node Analytic

We need reviewers! Please contact Emily Law if you’re interested.

Thank you!

1/7/16 38