DR / CNRS Pr / ENSAE LAL & LRI Laboratoire de Statistique ... · Pr / ENSAE! Laboratoire de...
Transcript of DR / CNRS Pr / ENSAE LAL & LRI Laboratoire de Statistique ... · Pr / ENSAE! Laboratoire de...
1
Center for Data ScienceParis-Saclay
DR / CNRS LAL & LRI
CNRS & University Paris-Sud
BALÁZS KÉGL
Pr / ENSAE Laboratoire de Statistique
ENSAE / CREST
ARNAK DALALYAN
B. Kégl / AppStat@LAL Learning to discover
Center for Data ScienceParis-Saclay
DATA SCIENCE
2
Design of automated methods
to analyze massive and complex data
to extract useful information
B. Kégl / AppStat@LAL Learning to discover
Center for Data ScienceParis-Saclay
THE DATA SCIENCE LANDSCAPE
3
Data scientist
Data trainer
Applied scientist
Domain expertSoftware engineer
Data engineer
Tool building Data domains
Data sciencestatistics
machine learning information retrieval
signal processing data visualization
databases
software engineeringclouds/grids
high-performancecomputing
optimization
energy and physical sciences health and life sciences Earth and environment
economy and society brain
B. Kégl / AppStat@LAL Learning to discover
Center for Data ScienceParis-Saclay
THE PARTICIPANTS
4
Biology & bioinformaticsIBISC/UEvry LRI/UPSudHepatinovCESP/UPSud-UVSQ-Inserm IGM-I2BC/UPSud MIA/AgroMIAj-MIG/INRALMAS/Centrale
ChemistryEA4041/UPSud
Earth sciencesLATMOS/UVSQ GEOPS/UPSudIPSL/UVSQLSCE/UVSQLMD/Polytechnique
EconomyLM/ENSAE RITM/UPSudLFA/ENSAE
NeuroscienceUNICOG/InsermU1000/InsermNeuroSpin/CEA
Particle physics astrophysics & cosmologyLPP/Polytechnique DMPH/ONERACosmoStat/CEAIAS/UPSudAIM/CEALAL/UPSud
The Paris-Saclay Center for Data ScienceData Science for scientific Data
250 researchers in 35 laboratories
Machine learningLRI/UPSud LTCI/TelecomCMLA/Cachan LS/ENSAELIX/PolytechniqueMIA/AgroCMA/PolytechniqueLSS/SupélecCVN/Centrale LMAS/CentraleDTIM/ONERAIBISC/UEvry
VisualizationINRIALIMSI
Signal processingLTCI/TelecomCMA/PolytechniqueCVN/CentraleLSS/SupélecCMLA/CachanLIMSIDTIM/ONERA
StatisticsLMO/UPSud LS/ENSAELSS/SupélecCMA/PolytechniqueLMAS/CentraleMIA/AgroParisTech
Data sciencestatistics
machine learninginformation retrieval
signal processingdata visualization
databases
Domain sciencehuman society
life brain earth
universe
Tool buildingsoftware engineering
clouds/gridshigh-performance
computingoptimization
Data scientist
Applied scientist
Domain scientist
Data engineer
Software engineer
Center for Data ScienceParis-Saclay
datascience-paris-saclay.fr
@SaclayCDS
Center for Data ScienceParis-Saclay
PARAMETERS
• 2 years: April 2014 - June 2016, 1.2M€
• +1 year, conditional on evaluation
• Light management
• executive committee of 17 members
• work groups
• management (around objectives)
• thematic (around scientific themes), open to everyone to propose and to participate
5
Center for Data ScienceParis-Saclay
GOALS
• Build a community at Saclay around data science
• Get interdisciplinary collaborations off the ground
• Support software tool building
• Data science IT platform (open data)
• Making the CDS a contact point to big data industry
6
Center for Data ScienceParis-Saclay
TOOLS
7
• We are designing and learning to manage tools to accompany data science projects with different needs
• If you have a project (or even some data and some ideas), you could choose from this “menu” of tools
Center for Data ScienceParis-Saclay
TOOLS
8
Data scientist
Data trainer
Applied scientist
Domain expertSoftware engineer
Data engineer
Tool building Data domains
Data sciencestatistics
machine learning information retrieval
signal processing data visualization
databases
• interdisciplinary projects • matchmaking tool • design and innovation strategy workshops • data challenges
• coding sprints • Open Software Initiative • code consolidator and engineering projects
software engineeringclouds/grids
high-performancecomputing
optimization
energy and physical sciences health and life sciences Earth and environment
economy and society brain
• data science bootcamps • IT platform for linked data • SaaS data science platform
Alex Balázs
Karima
Akin
Center for Data ScienceParis-Saclay
POSTDOCS, THESES, SABBATICALS
• Goals
• seeding larger projects for outside funding (ANR, Europe)
• embedding data scientists in domain science labs and vice verse
9
Center for Data ScienceParis-Saclay
POSTDOCS, THESES, SABBATICALS
• Common selection criteria
• scientific quality
• expected results both in domain science and data science
• relevance and feasibility
• (real) scientific data, available at the start data of the project
• interdisciplinarity (PIs both from domain and data sciences, different LABEXs)
• community building
• organizing and participating in thematic days, bootcamps, workshops
10
Center for Data ScienceParis-Saclay
POSTDOCS, THESES
• Schedule
• 5 postdocs in first call (July 2014)
• 6 theses (within COFUND, under evaluation)
• 2-3 projects in second call (planned in March 2015)
• a third call in 2016 in case the CDS is extended
11
Center for Data ScienceParis-Saclay
SABBATICALS, VISITING FELLOWSHIPS
• 3-12 months visiting (incoming) fellowships
• typically paying for replacement teaching
• open continuously
12
Center for Data ScienceParis-Saclay
BOOTCAMPS
• Single-day coding sessions
• 20-30 participants
• Goals
• training PhD students, postdocs, engineers, senior researchers for hands-on data science (problem types, tools)
• solving (prototyping) real data science problems
• networking, knowing each other
13
Center for Data ScienceParis-Saclay
BOOTCAMPS
• If you want to participate
• fill out a (longish) form: we need to know you
• sign up on the indico for upcoming sessions
• come prepared (read the material, install the starting kit, etc.)
14
Center for Data ScienceParis-Saclay
BOOTCAMPS
• If you want to propose a bootcamp
• contact us
• think about the data set and the problem you’d like to solve
• it’s OK if it’s not completely mature yet, we can run a workshop for defining the project
15
THANK YOU!
16
Center for Data ScienceParis-Saclay
A UNIQUE OPPORTUNITY WITH UNIQUE CHALLENGES
• Unparallelled depth and breadth of talent: how to make them work together?
• Shortening the collaboration turnaround using coached workshops
• Fast-forward bootcamp-style data science training for a new crop of researchers
• Changing the carrier incentives: tool building, interdisciplinary research
• Inventing the institutional framework
17
Center for Data ScienceParis-Saclay
GOALS AND TOOLS
• Build a community at Saclay around data science
• an agora where researchers and engineers can meet and talk
• a culture of crossing the disciplinary aisles
• get to know each other’s expertise and data analysis problems
• Tools
• an interactive web portal: http://datascience-paris-saclay.fr (preliminary)
• workshops, thematic days, and creativity workshops
• summer school(s)
18
Center for Data ScienceParis-Saclay
GOALS AND TOOLS
• Get interdisciplinary collaborations off the ground
• seeding larger projects for outside funding (ANR, Europe)
• embedding data scientists in domain science labs and vice verse
• Tools
• financing postdoctoral projects, theses and (incoming) sabbatical/visiting stays
• 5 postdocs in first call (July 2014)
• 6 theses (within COFUND, launched)
• 2-3 projects in second call (planned)
• financing data challenges
• http://higgsml.lal.in2p3.fr
19
Center for Data ScienceParis-Saclay
GOALS AND TOOLS
• Support software tool building • primarily open source
• development, maintenance, reusability across disciplines
• leadership of Alexandre Gramfort (LTCI Telecom/CEA Neurospin) and Guillaume Wisniewski (LIMSI/UPSud)
• Tools • the Open Software Initiative for interns and doctoral students
• 7 doctoral missions, very popular (17 candidates)
• “code consolidator” and engineering projects • 4 engineering projects + 1 code consolidator in first call (July 2014)
• coding sprints • scikit-learn coding sprint (July 2014)
• data science bootcamps
20
Center for Data ScienceParis-Saclay
GOALS AND TOOLS
• Data science IT platform
• open data, open access, reproducible research (see IPOL)
• disseminating and sharing data and software
• connecting the CDS to data centers (e.g. Virtual Data)
• leadership of Cécile Germain (LRI / UPSud)
• Tools
• web portal: http://io.datascience-paris-saclay.fr (preliminary)
• work group
21