Collaborative data-driven...

37
Collaborative data-driven science Collaborative data-driven science Mike Rippin

Transcript of Collaborative data-driven...

Page 1: Collaborative data-driven scienceidies.jhu.edu/wp-content/uploads/2016/10/SciServer_IDIES...Collaborative data-driven science 6 Alex Szalay - PI Mike Rippin –PM Ani Thakar, Gerard

Collaborative data-driven science

Collaborative data-driven science

Mike Rippin

Page 2: Collaborative data-driven scienceidies.jhu.edu/wp-content/uploads/2016/10/SciServer_IDIES...Collaborative data-driven science 6 Alex Szalay - PI Mike Rippin –PM Ani Thakar, Gerard

Collaborative data-driven science

Background and History of SciServer

Major Objectives

Current System

SciServer Compute – Now

SciServer Compute – Future

Q&A

2

Page 3: Collaborative data-driven scienceidies.jhu.edu/wp-content/uploads/2016/10/SciServer_IDIES...Collaborative data-driven science 6 Alex Szalay - PI Mike Rippin –PM Ani Thakar, Gerard

Collaborative data-driven science

Collaborative data-driven science

Page 4: Collaborative data-driven scienceidies.jhu.edu/wp-content/uploads/2016/10/SciServer_IDIES...Collaborative data-driven science 6 Alex Szalay - PI Mike Rippin –PM Ani Thakar, Gerard

Collaborative data-driven science

4

“The Project aims to create a sustainable collaborative ecosystem built around several large scientific data sets for the broader science community, based upon the expertise developed for the Sloan Digital Sky Survey (SDSS) SkyServer and associated projects.”

Page 5: Collaborative data-driven scienceidies.jhu.edu/wp-content/uploads/2016/10/SciServer_IDIES...Collaborative data-driven science 6 Alex Szalay - PI Mike Rippin –PM Ani Thakar, Gerard

Collaborative data-driven science

5

NSF Cooperative Agreement

5 years duration, just completed first 3

Development of Cyberinfrastructure

Science Driven

Page 6: Collaborative data-driven scienceidies.jhu.edu/wp-content/uploads/2016/10/SciServer_IDIES...Collaborative data-driven science 6 Alex Szalay - PI Mike Rippin –PM Ani Thakar, Gerard

Collaborative data-driven science

6

Alex Szalay - PI Mike Rippin – PM Ani Thakar, Gerard Lemson, Jordan Raddick,

Bonnie Souter – Associate Directors (Team Leads)

Technical Team: Dmitry Medvedev, Manuchehr Taghizadeh-Popp, Jai Won Kim, Sue Werner, Victor Paul, Jan Vandenberg, Lance Joseph, Alainna White, Laszlo Dobos

Page 7: Collaborative data-driven scienceidies.jhu.edu/wp-content/uploads/2016/10/SciServer_IDIES...Collaborative data-driven science 6 Alex Szalay - PI Mike Rippin –PM Ani Thakar, Gerard

Collaborative data-driven science

Started with the SDSS SkyServer

Goal: instant access to rich content

Idea: bring the analysis to the data

Interactive access at the core

7

Page 8: Collaborative data-driven scienceidies.jhu.edu/wp-content/uploads/2016/10/SciServer_IDIES...Collaborative data-driven science 6 Alex Szalay - PI Mike Rippin –PM Ani Thakar, Gerard

Collaborative data-driven science

Interactive science on petascale data

Create scalable open numerical laboratories

Large footprint across many disciplines

Use commonly shared building blocks

Major national and international impact

Ani Thakar, JHU 8

Page 9: Collaborative data-driven scienceidies.jhu.edu/wp-content/uploads/2016/10/SciServer_IDIES...Collaborative data-driven science 6 Alex Szalay - PI Mike Rippin –PM Ani Thakar, Gerard

Collaborative data-driven science

Collaborative data-driven science

Page 10: Collaborative data-driven scienceidies.jhu.edu/wp-content/uploads/2016/10/SciServer_IDIES...Collaborative data-driven science 6 Alex Szalay - PI Mike Rippin –PM Ani Thakar, Gerard

Collaborative data-driven science

10

Cyber Infrastructure

Science Collaboration

SDSS Integration

Outreach & Education

Page 11: Collaborative data-driven scienceidies.jhu.edu/wp-content/uploads/2016/10/SciServer_IDIES...Collaborative data-driven science 6 Alex Szalay - PI Mike Rippin –PM Ani Thakar, Gerard

Collaborative data-driven science

11

Cyber Infrastructure

Page 12: Collaborative data-driven scienceidies.jhu.edu/wp-content/uploads/2016/10/SciServer_IDIES...Collaborative data-driven science 6 Alex Szalay - PI Mike Rippin –PM Ani Thakar, Gerard

Collaborative data-driven science

12

Database storage & Query:

Data analysis:

Data exploration:

User sign-on:

File storage:

Page 13: Collaborative data-driven scienceidies.jhu.edu/wp-content/uploads/2016/10/SciServer_IDIES...Collaborative data-driven science 6 Alex Szalay - PI Mike Rippin –PM Ani Thakar, Gerard

Collaborative data-driven science

13

Page 14: Collaborative data-driven scienceidies.jhu.edu/wp-content/uploads/2016/10/SciServer_IDIES...Collaborative data-driven science 6 Alex Szalay - PI Mike Rippin –PM Ani Thakar, Gerard

Collaborative data-driven science

14

Page 15: Collaborative data-driven scienceidies.jhu.edu/wp-content/uploads/2016/10/SciServer_IDIES...Collaborative data-driven science 6 Alex Szalay - PI Mike Rippin –PM Ani Thakar, Gerard

Collaborative data-driven science

15

Hosted Data

Personal Data

Single Sign-On

Qu

ery

Co

mp

ute

Cyber Infrastructure

Page 16: Collaborative data-driven scienceidies.jhu.edu/wp-content/uploads/2016/10/SciServer_IDIES...Collaborative data-driven science 6 Alex Szalay - PI Mike Rippin –PM Ani Thakar, Gerard

Collaborative data-driven science

16

Hosted Data

Cyber Infrastructure

Astronomy Cosmology

Turbulence Genomics

Materials Science Oceanography

Page 17: Collaborative data-driven scienceidies.jhu.edu/wp-content/uploads/2016/10/SciServer_IDIES...Collaborative data-driven science 6 Alex Szalay - PI Mike Rippin –PM Ani Thakar, Gerard

Collaborative data-driven science

17

Co

mp

ute

Cyber Infrastructure

Server Cluster

Page 18: Collaborative data-driven scienceidies.jhu.edu/wp-content/uploads/2016/10/SciServer_IDIES...Collaborative data-driven science 6 Alex Szalay - PI Mike Rippin –PM Ani Thakar, Gerard

Collaborative data-driven science

18

Co

mp

ute

Cyber Infrastructure

Server Cluster

VM VM

Page 19: Collaborative data-driven scienceidies.jhu.edu/wp-content/uploads/2016/10/SciServer_IDIES...Collaborative data-driven science 6 Alex Szalay - PI Mike Rippin –PM Ani Thakar, Gerard

Collaborative data-driven science

19

Co

mp

ute

Cyber Infrastructure

VM

Docker Docker Docker

Page 20: Collaborative data-driven scienceidies.jhu.edu/wp-content/uploads/2016/10/SciServer_IDIES...Collaborative data-driven science 6 Alex Szalay - PI Mike Rippin –PM Ani Thakar, Gerard

Collaborative data-driven science

20

Co

mp

ute

Cyber Infrastructure

Docker

Jupyter

Page 21: Collaborative data-driven scienceidies.jhu.edu/wp-content/uploads/2016/10/SciServer_IDIES...Collaborative data-driven science 6 Alex Szalay - PI Mike Rippin –PM Ani Thakar, Gerard

Collaborative data-driven science

21

Co

mp

ute

Cyber Infrastructure

Docker

Jupyter

INTERACTIVE & SYNCHRONOUS

Page 22: Collaborative data-driven scienceidies.jhu.edu/wp-content/uploads/2016/10/SciServer_IDIES...Collaborative data-driven science 6 Alex Szalay - PI Mike Rippin –PM Ani Thakar, Gerard

Collaborative data-driven science

22

Engine for executing analysis on data sets Environment for executing Python Notebooks

Interactively Utility API Libraries in Python and R Interacts with ALL other SciServer

components that have a WS API:◦ Login Portal for authentication◦ CASJobs for Queries◦ SkyServer and SkyQuery for Astronomy data◦ SciDrive for Storage

Page 23: Collaborative data-driven scienceidies.jhu.edu/wp-content/uploads/2016/10/SciServer_IDIES...Collaborative data-driven science 6 Alex Szalay - PI Mike Rippin –PM Ani Thakar, Gerard

Collaborative data-driven science

23

Page 24: Collaborative data-driven scienceidies.jhu.edu/wp-content/uploads/2016/10/SciServer_IDIES...Collaborative data-driven science 6 Alex Szalay - PI Mike Rippin –PM Ani Thakar, Gerard

Collaborative data-driven science

24

Page 25: Collaborative data-driven scienceidies.jhu.edu/wp-content/uploads/2016/10/SciServer_IDIES...Collaborative data-driven science 6 Alex Szalay - PI Mike Rippin –PM Ani Thakar, Gerard

Collaborative data-driven science

25

Page 26: Collaborative data-driven scienceidies.jhu.edu/wp-content/uploads/2016/10/SciServer_IDIES...Collaborative data-driven science 6 Alex Szalay - PI Mike Rippin –PM Ani Thakar, Gerard

Collaborative data-driven science

26

Page 27: Collaborative data-driven scienceidies.jhu.edu/wp-content/uploads/2016/10/SciServer_IDIES...Collaborative data-driven science 6 Alex Szalay - PI Mike Rippin –PM Ani Thakar, Gerard

Collaborative data-driven science

27

Page 28: Collaborative data-driven scienceidies.jhu.edu/wp-content/uploads/2016/10/SciServer_IDIES...Collaborative data-driven science 6 Alex Szalay - PI Mike Rippin –PM Ani Thakar, Gerard

Collaborative data-driven science

28

Page 29: Collaborative data-driven scienceidies.jhu.edu/wp-content/uploads/2016/10/SciServer_IDIES...Collaborative data-driven science 6 Alex Szalay - PI Mike Rippin –PM Ani Thakar, Gerard

Collaborative data-driven science

Collaborative data-driven science

Page 30: Collaborative data-driven scienceidies.jhu.edu/wp-content/uploads/2016/10/SciServer_IDIES...Collaborative data-driven science 6 Alex Szalay - PI Mike Rippin –PM Ani Thakar, Gerard

Collaborative data-driven science

30

HPC

CloudStore

DBSkyServer

Compute

SciDrive

indexing

cycles/byte

SciServer

Page 31: Collaborative data-driven scienceidies.jhu.edu/wp-content/uploads/2016/10/SciServer_IDIES...Collaborative data-driven science 6 Alex Szalay - PI Mike Rippin –PM Ani Thakar, Gerard

Collaborative data-driven science

31

Build on VM/Docker Architecture

Scalable non-interactive, asynchronous Job management (JOBM)

Rich Access Controls (RACM)

Distributed compute execution (COMPM)

Support Python, R, Matlab

Page 32: Collaborative data-driven scienceidies.jhu.edu/wp-content/uploads/2016/10/SciServer_IDIES...Collaborative data-driven science 6 Alex Szalay - PI Mike Rippin –PM Ani Thakar, Gerard

Collaborative data-driven science

32

Dashboard UI

SciDrivePlugins

CASJobs

Resource and Access Control Manager

Compute Manager

D D

D D

D D D

Server Cluster

Compute Manager

D D

D D

D D D

Server Cluster

Job Manager

Job List MetadataACLs

Resources

Script Job

Script Job

Query Job

PUSH

PUSH

PUSHPULL

PULLAsync

Async

Jupyter

Page 33: Collaborative data-driven scienceidies.jhu.edu/wp-content/uploads/2016/10/SciServer_IDIES...Collaborative data-driven science 6 Alex Szalay - PI Mike Rippin –PM Ani Thakar, Gerard

Collaborative data-driven science

33

Dashboard UI

SciDrivePlugins

CASJobs

Resource and Access Control Manager

Compute Manager

D D

D D

D D D

Server Cluster

Compute Manager

D D

D D

D D D

Server Cluster

Job Manager

Job List MetadataACLs

Resources

Script Job

Script Job

Query Job

PUSH

PUSH

PUSHPULL

PULLAsync

Async

Jupyter

Page 34: Collaborative data-driven scienceidies.jhu.edu/wp-content/uploads/2016/10/SciServer_IDIES...Collaborative data-driven science 6 Alex Szalay - PI Mike Rippin –PM Ani Thakar, Gerard

Collaborative data-driven science

34

Dashboard UI

SciDrivePlugins

CASJobs

Resource and Access Control Manager

Compute Manager

D D

D D

D D D

Server Cluster

Compute Manager

D D

D D

D D D

Server Cluster

Job Manager

Job List MetadataACLs

Resources

Script Job

Script Job

Query Job

PUSH

PUSH

PUSHPULL

PULLAsync

Async

Jupyter

Page 35: Collaborative data-driven scienceidies.jhu.edu/wp-content/uploads/2016/10/SciServer_IDIES...Collaborative data-driven science 6 Alex Szalay - PI Mike Rippin –PM Ani Thakar, Gerard

Collaborative data-driven science

35

Dashboard UI

SciDrivePlugins

CASJobs

Resource and Access Control Manager

Compute Manager

D D

D D

D D D

Server Cluster

Compute Manager

D D

D D

D D D

Server Cluster

Job Manager

Job List MetadataACLs

Resources

Script Job

Script Job

Query Job

PUSH

PUSH

PUSHPULL

PULLAsync

Async

Jupyter

Jupyter Notebook (INVISIBLE)

<code>{

CJ.Query()SD.Write()Open.File()

}

CASJobs Persistent File SciDrive

MyDB ScratchDBScratch File SDSS Turbulence MatSci

EXTERNAL

D

Page 36: Collaborative data-driven scienceidies.jhu.edu/wp-content/uploads/2016/10/SciServer_IDIES...Collaborative data-driven science 6 Alex Szalay - PI Mike Rippin –PM Ani Thakar, Gerard

Collaborative data-driven science

36

SciServer Compute Interactive is live now

Supports Python, R, Jupyter Runs on a 4 node cluster Access to several domain databases

Asynchronous Job Execution early 2017

Please register with SciServer and try it out

Page 37: Collaborative data-driven scienceidies.jhu.edu/wp-content/uploads/2016/10/SciServer_IDIES...Collaborative data-driven science 6 Alex Szalay - PI Mike Rippin –PM Ani Thakar, Gerard

Collaborative data-driven science

Collaborative data-driven science