Collaborative data-driven science · Collaborative data-driven science-Data-analysis capability...

12
Collaborative data-driven science

Transcript of Collaborative data-driven science · Collaborative data-driven science-Data-analysis capability...

Page 1: Collaborative data-driven science · Collaborative data-driven science-Data-analysis capability with Jupyter Notebooks. - python, R(Rstudio), Matlab, Julia, ... - terminal: conda/pip,

Collaborative data-driven science

Page 2: Collaborative data-driven science · Collaborative data-driven science-Data-analysis capability with Jupyter Notebooks. - python, R(Rstudio), Matlab, Julia, ... - terminal: conda/pip,

Collaborative data-driven science

Big part of science is about data.

(data collection, cleaning, analysis, publishing, mirroring, etc.)

BIG DATA: can’t download to laptop for analysis (100 TB+)

Bring analysis close to data.

2

Page 3: Collaborative data-driven science · Collaborative data-driven science-Data-analysis capability with Jupyter Notebooks. - python, R(Rstudio), Matlab, Julia, ... - terminal: conda/pip,

Collaborative data-driven science

Give scientists web tools providing…

1) …hosting of huge public/private datasets.

2) …data-intensive computing for everyone.

3) …personal data storage space.

4) …capability for sharing data within a team.

Based at Johns Hopkins University.

3

Page 4: Collaborative data-driven science · Collaborative data-driven science-Data-analysis capability with Jupyter Notebooks. - python, R(Rstudio), Matlab, Julia, ... - terminal: conda/pip,

Collaborative data-driven science

Early 2000s: websites exposing SDSS database.

SkyServer: for exploring sky objects.

CasJobs: asynch SQL queries, personal database storage.

4

Page 5: Collaborative data-driven science · Collaborative data-driven science-Data-analysis capability with Jupyter Notebooks. - python, R(Rstudio), Matlab, Julia, ... - terminal: conda/pip,

Collaborative data-driven science

-Data-analysis capability with Jupyter Notebooks.- python, R(Rstudio), Matlab, Julia, ...- terminal: conda/pip, git, gcc, …

-Creation of teams and sharing private resources.- use in class room: course ware- discussed in GWS1

-Expansion to all sciences: Genomics, Oceanography, Material Science, Turbulence,

Humanities, Health, …

5

Page 6: Collaborative data-driven science · Collaborative data-driven science-Data-analysis capability with Jupyter Notebooks. - python, R(Rstudio), Matlab, Julia, ... - terminal: conda/pip,

Collaborative data-driven science

6

Page 7: Collaborative data-driven science · Collaborative data-driven science-Data-analysis capability with Jupyter Notebooks. - python, R(Rstudio), Matlab, Julia, ... - terminal: conda/pip,

Collaborative data-driven science

7

Page 8: Collaborative data-driven science · Collaborative data-driven science-Data-analysis capability with Jupyter Notebooks. - python, R(Rstudio), Matlab, Julia, ... - terminal: conda/pip,

Collaborative data-driven science

Data-intensive computing with Jupyter Notebooks in Docker Containers.

-Containers give isolated

Linux environment

-Private or public data

volumes in file system.

-Notebooks in Python, R, Matlab.

-SciScript libraries:

for loading external data

into Notebook.

-Also Notebooks as batch Jobs.8

Page 9: Collaborative data-driven science · Collaborative data-driven science-Data-analysis capability with Jupyter Notebooks. - python, R(Rstudio), Matlab, Julia, ... - terminal: conda/pip,

Collaborative data-driven science

9

Page 10: Collaborative data-driven science · Collaborative data-driven science-Data-analysis capability with Jupyter Notebooks. - python, R(Rstudio), Matlab, Julia, ... - terminal: conda/pip,

Collaborative data-driven science

10

Page 11: Collaborative data-driven science · Collaborative data-driven science-Data-analysis capability with Jupyter Notebooks. - python, R(Rstudio), Matlab, Julia, ... - terminal: conda/pip,

Collaborative data-driven science

11

Page 12: Collaborative data-driven science · Collaborative data-driven science-Data-analysis capability with Jupyter Notebooks. - python, R(Rstudio), Matlab, Julia, ... - terminal: conda/pip,

Collaborative data-driven science

12