DR / CNRS Pr / ENSAE LAL & LRI Laboratoire de Statistique ... · Pr / ENSAE! Laboratoire de...

21
1 Center for Data Science Paris-Saclay DR / CNRS LAL & LRI CNRS & University Paris-Sud BALÁZS KÉGL Pr / ENSAE Laboratoire de Statistique ENSAE / CREST ARNAK DALALYAN

Transcript of DR / CNRS Pr / ENSAE LAL & LRI Laboratoire de Statistique ... · Pr / ENSAE! Laboratoire de...

Page 1: DR / CNRS Pr / ENSAE LAL & LRI Laboratoire de Statistique ... · Pr / ENSAE! Laboratoire de Statistique! ENSAE / CREST ARNAK DALALYAN. B. Kégl / AppStat@LAL Learning to discover

1

Center for Data ScienceParis-Saclay

DR / CNRS LAL & LRI

CNRS & University Paris-Sud

BALÁZS KÉGL

Pr / ENSAE Laboratoire de Statistique

ENSAE / CREST

ARNAK DALALYAN

Page 2: DR / CNRS Pr / ENSAE LAL & LRI Laboratoire de Statistique ... · Pr / ENSAE! Laboratoire de Statistique! ENSAE / CREST ARNAK DALALYAN. B. Kégl / AppStat@LAL Learning to discover

B. Kégl / AppStat@LAL Learning to discover

Center for Data ScienceParis-Saclay

DATA SCIENCE

2

Design of automated methods

to analyze massive and complex data

to extract useful information

Page 3: DR / CNRS Pr / ENSAE LAL & LRI Laboratoire de Statistique ... · Pr / ENSAE! Laboratoire de Statistique! ENSAE / CREST ARNAK DALALYAN. B. Kégl / AppStat@LAL Learning to discover

B. Kégl / AppStat@LAL Learning to discover

Center for Data ScienceParis-Saclay

THE DATA SCIENCE LANDSCAPE

3

Data scientist

Data trainer

Applied scientist

Domain expertSoftware engineer

Data engineer

Tool building Data domains

Data sciencestatistics

machine learning information retrieval

signal processing data visualization

databases

software engineeringclouds/grids

high-performancecomputing

optimization

energy and physical sciences health and life sciences Earth and environment

economy and society brain

Page 4: DR / CNRS Pr / ENSAE LAL & LRI Laboratoire de Statistique ... · Pr / ENSAE! Laboratoire de Statistique! ENSAE / CREST ARNAK DALALYAN. B. Kégl / AppStat@LAL Learning to discover

B. Kégl / AppStat@LAL Learning to discover

Center for Data ScienceParis-Saclay

THE PARTICIPANTS

4

Biology & bioinformaticsIBISC/UEvry LRI/UPSudHepatinovCESP/UPSud-UVSQ-Inserm IGM-I2BC/UPSud MIA/AgroMIAj-MIG/INRALMAS/Centrale

ChemistryEA4041/UPSud

Earth sciencesLATMOS/UVSQ GEOPS/UPSudIPSL/UVSQLSCE/UVSQLMD/Polytechnique

EconomyLM/ENSAE RITM/UPSudLFA/ENSAE

NeuroscienceUNICOG/InsermU1000/InsermNeuroSpin/CEA

Particle physics astrophysics & cosmologyLPP/Polytechnique DMPH/ONERACosmoStat/CEAIAS/UPSudAIM/CEALAL/UPSud

The Paris-Saclay Center for Data ScienceData Science for scientific Data

250 researchers in 35 laboratories

Machine learningLRI/UPSud LTCI/TelecomCMLA/Cachan LS/ENSAELIX/PolytechniqueMIA/AgroCMA/PolytechniqueLSS/SupélecCVN/Centrale LMAS/CentraleDTIM/ONERAIBISC/UEvry

VisualizationINRIALIMSI

Signal processingLTCI/TelecomCMA/PolytechniqueCVN/CentraleLSS/SupélecCMLA/CachanLIMSIDTIM/ONERA

StatisticsLMO/UPSud LS/ENSAELSS/SupélecCMA/PolytechniqueLMAS/CentraleMIA/AgroParisTech

Data sciencestatistics

machine learninginformation retrieval

signal processingdata visualization

databases

Domain sciencehuman society

life brain earth

universe

Tool buildingsoftware engineering

clouds/gridshigh-performance

computingoptimization

Data scientist

Applied scientist

Domain scientist

Data engineer

Software engineer

Center for Data ScienceParis-Saclay

datascience-paris-saclay.fr

@SaclayCDS

Page 5: DR / CNRS Pr / ENSAE LAL & LRI Laboratoire de Statistique ... · Pr / ENSAE! Laboratoire de Statistique! ENSAE / CREST ARNAK DALALYAN. B. Kégl / AppStat@LAL Learning to discover

Center for Data ScienceParis-Saclay

PARAMETERS

• 2 years: April 2014 - June 2016, 1.2M€

• +1 year, conditional on evaluation

• Light management

• executive committee of 17 members

• work groups

• management (around objectives)

• thematic (around scientific themes), open to everyone to propose and to participate

5

Page 6: DR / CNRS Pr / ENSAE LAL & LRI Laboratoire de Statistique ... · Pr / ENSAE! Laboratoire de Statistique! ENSAE / CREST ARNAK DALALYAN. B. Kégl / AppStat@LAL Learning to discover

Center for Data ScienceParis-Saclay

GOALS

• Build a community at Saclay around data science

• Get interdisciplinary collaborations off the ground

• Support software tool building

• Data science IT platform (open data)

• Making the CDS a contact point to big data industry

6

Page 7: DR / CNRS Pr / ENSAE LAL & LRI Laboratoire de Statistique ... · Pr / ENSAE! Laboratoire de Statistique! ENSAE / CREST ARNAK DALALYAN. B. Kégl / AppStat@LAL Learning to discover

Center for Data ScienceParis-Saclay

TOOLS

7

• We are designing and learning to manage tools to accompany data science projects with different needs

• If you have a project (or even some data and some ideas), you could choose from this “menu” of tools

Page 8: DR / CNRS Pr / ENSAE LAL & LRI Laboratoire de Statistique ... · Pr / ENSAE! Laboratoire de Statistique! ENSAE / CREST ARNAK DALALYAN. B. Kégl / AppStat@LAL Learning to discover

Center for Data ScienceParis-Saclay

TOOLS

8

Data scientist

Data trainer

Applied scientist

Domain expertSoftware engineer

Data engineer

Tool building Data domains

Data sciencestatistics

machine learning information retrieval

signal processing data visualization

databases

• interdisciplinary projects • matchmaking tool • design and innovation strategy workshops • data challenges

• coding sprints • Open Software Initiative • code consolidator and engineering projects

software engineeringclouds/grids

high-performancecomputing

optimization

energy and physical sciences health and life sciences Earth and environment

economy and society brain

• data science bootcamps • IT platform for linked data • SaaS data science platform

Alex Balázs

Karima

Akin

Page 9: DR / CNRS Pr / ENSAE LAL & LRI Laboratoire de Statistique ... · Pr / ENSAE! Laboratoire de Statistique! ENSAE / CREST ARNAK DALALYAN. B. Kégl / AppStat@LAL Learning to discover

Center for Data ScienceParis-Saclay

POSTDOCS, THESES, SABBATICALS

• Goals

• seeding larger projects for outside funding (ANR, Europe)

• embedding data scientists in domain science labs and vice verse

9

Page 10: DR / CNRS Pr / ENSAE LAL & LRI Laboratoire de Statistique ... · Pr / ENSAE! Laboratoire de Statistique! ENSAE / CREST ARNAK DALALYAN. B. Kégl / AppStat@LAL Learning to discover

Center for Data ScienceParis-Saclay

POSTDOCS, THESES, SABBATICALS

• Common selection criteria

• scientific quality

• expected results both in domain science and data science

• relevance and feasibility

• (real) scientific data, available at the start data of the project

• interdisciplinarity (PIs both from domain and data sciences, different LABEXs)

• community building

• organizing and participating in thematic days, bootcamps, workshops

10

Page 11: DR / CNRS Pr / ENSAE LAL & LRI Laboratoire de Statistique ... · Pr / ENSAE! Laboratoire de Statistique! ENSAE / CREST ARNAK DALALYAN. B. Kégl / AppStat@LAL Learning to discover

Center for Data ScienceParis-Saclay

POSTDOCS, THESES

• Schedule

• 5 postdocs in first call (July 2014)

• 6 theses (within COFUND, under evaluation)

• 2-3 projects in second call (planned in March 2015)

• a third call in 2016 in case the CDS is extended

11

Page 12: DR / CNRS Pr / ENSAE LAL & LRI Laboratoire de Statistique ... · Pr / ENSAE! Laboratoire de Statistique! ENSAE / CREST ARNAK DALALYAN. B. Kégl / AppStat@LAL Learning to discover

Center for Data ScienceParis-Saclay

SABBATICALS, VISITING FELLOWSHIPS

• 3-12 months visiting (incoming) fellowships

• typically paying for replacement teaching

• open continuously

12

Page 13: DR / CNRS Pr / ENSAE LAL & LRI Laboratoire de Statistique ... · Pr / ENSAE! Laboratoire de Statistique! ENSAE / CREST ARNAK DALALYAN. B. Kégl / AppStat@LAL Learning to discover

Center for Data ScienceParis-Saclay

BOOTCAMPS

• Single-day coding sessions

• 20-30 participants

• Goals

• training PhD students, postdocs, engineers, senior researchers for hands-on data science (problem types, tools)

• solving (prototyping) real data science problems

• networking, knowing each other

13

Page 14: DR / CNRS Pr / ENSAE LAL & LRI Laboratoire de Statistique ... · Pr / ENSAE! Laboratoire de Statistique! ENSAE / CREST ARNAK DALALYAN. B. Kégl / AppStat@LAL Learning to discover

Center for Data ScienceParis-Saclay

BOOTCAMPS

• If you want to participate

• fill out a (longish) form: we need to know you

• sign up on the indico for upcoming sessions

• come prepared (read the material, install the starting kit, etc.)

14

Page 15: DR / CNRS Pr / ENSAE LAL & LRI Laboratoire de Statistique ... · Pr / ENSAE! Laboratoire de Statistique! ENSAE / CREST ARNAK DALALYAN. B. Kégl / AppStat@LAL Learning to discover

Center for Data ScienceParis-Saclay

BOOTCAMPS

• If you want to propose a bootcamp

• contact us

• think about the data set and the problem you’d like to solve

• it’s OK if it’s not completely mature yet, we can run a workshop for defining the project

15

Page 16: DR / CNRS Pr / ENSAE LAL & LRI Laboratoire de Statistique ... · Pr / ENSAE! Laboratoire de Statistique! ENSAE / CREST ARNAK DALALYAN. B. Kégl / AppStat@LAL Learning to discover

THANK YOU!

16

Page 17: DR / CNRS Pr / ENSAE LAL & LRI Laboratoire de Statistique ... · Pr / ENSAE! Laboratoire de Statistique! ENSAE / CREST ARNAK DALALYAN. B. Kégl / AppStat@LAL Learning to discover

Center for Data ScienceParis-Saclay

A UNIQUE OPPORTUNITY WITH UNIQUE CHALLENGES

• Unparallelled depth and breadth of talent: how to make them work together?

• Shortening the collaboration turnaround using coached workshops

• Fast-forward bootcamp-style data science training for a new crop of researchers

• Changing the carrier incentives: tool building, interdisciplinary research

• Inventing the institutional framework

17

Page 18: DR / CNRS Pr / ENSAE LAL & LRI Laboratoire de Statistique ... · Pr / ENSAE! Laboratoire de Statistique! ENSAE / CREST ARNAK DALALYAN. B. Kégl / AppStat@LAL Learning to discover

Center for Data ScienceParis-Saclay

GOALS AND TOOLS

• Build a community at Saclay around data science

• an agora where researchers and engineers can meet and talk

• a culture of crossing the disciplinary aisles

• get to know each other’s expertise and data analysis problems

• Tools

• an interactive web portal: http://datascience-paris-saclay.fr (preliminary)

• workshops, thematic days, and creativity workshops

• summer school(s)

18

Page 19: DR / CNRS Pr / ENSAE LAL & LRI Laboratoire de Statistique ... · Pr / ENSAE! Laboratoire de Statistique! ENSAE / CREST ARNAK DALALYAN. B. Kégl / AppStat@LAL Learning to discover

Center for Data ScienceParis-Saclay

GOALS AND TOOLS

• Get interdisciplinary collaborations off the ground

• seeding larger projects for outside funding (ANR, Europe)

• embedding data scientists in domain science labs and vice verse

• Tools

• financing postdoctoral projects, theses and (incoming) sabbatical/visiting stays

• 5 postdocs in first call (July 2014)

• 6 theses (within COFUND, launched)

• 2-3 projects in second call (planned)

• financing data challenges

• http://higgsml.lal.in2p3.fr

19

Page 20: DR / CNRS Pr / ENSAE LAL & LRI Laboratoire de Statistique ... · Pr / ENSAE! Laboratoire de Statistique! ENSAE / CREST ARNAK DALALYAN. B. Kégl / AppStat@LAL Learning to discover

Center for Data ScienceParis-Saclay

GOALS AND TOOLS

• Support software tool building • primarily open source

• development, maintenance, reusability across disciplines

• leadership of Alexandre Gramfort (LTCI Telecom/CEA Neurospin) and Guillaume Wisniewski (LIMSI/UPSud)

• Tools • the Open Software Initiative for interns and doctoral students

• 7 doctoral missions, very popular (17 candidates)

• “code consolidator” and engineering projects • 4 engineering projects + 1 code consolidator in first call (July 2014)

• coding sprints • scikit-learn coding sprint (July 2014)

• data science bootcamps

20

Page 21: DR / CNRS Pr / ENSAE LAL & LRI Laboratoire de Statistique ... · Pr / ENSAE! Laboratoire de Statistique! ENSAE / CREST ARNAK DALALYAN. B. Kégl / AppStat@LAL Learning to discover

Center for Data ScienceParis-Saclay

GOALS AND TOOLS

• Data science IT platform

• open data, open access, reproducible research (see IPOL)

• disseminating and sharing data and software

• connecting the CDS to data centers (e.g. Virtual Data)

• leadership of Cécile Germain (LRI / UPSud)

• Tools

• web portal: http://io.datascience-paris-saclay.fr (preliminary)

• work group

21