Deep learning and the systemic challenges of data science initiatives
-
Upload
balazs-kegl -
Category
Science
-
view
273 -
download
0
Transcript of Deep learning and the systemic challenges of data science initiatives
![Page 1: Deep learning and the systemic challenges of data science initiatives](https://reader031.fdocuments.in/reader031/viewer/2022030313/58a5eb1c1a28aba5728b4dcd/html5/thumbnails/1.jpg)
Center for Data ScienceParis-Saclay1
CNRS & University Paris SaclayBALÁZS KÉGL
DEEP LEARNING AND THE SYSTEMIC CHALLENGES OF DATA SCIENCE
INITIATIVES
![Page 2: Deep learning and the systemic challenges of data science initiatives](https://reader031.fdocuments.in/reader031/viewer/2022030313/58a5eb1c1a28aba5728b4dcd/html5/thumbnails/2.jpg)
2
I’m not going to explain deep learning in detail
![Page 3: Deep learning and the systemic challenges of data science initiatives](https://reader031.fdocuments.in/reader031/viewer/2022030313/58a5eb1c1a28aba5728b4dcd/html5/thumbnails/3.jpg)
3
Rather: give an overview of what you can do with it
![Page 4: Deep learning and the systemic challenges of data science initiatives](https://reader031.fdocuments.in/reader031/viewer/2022030313/58a5eb1c1a28aba5728b4dcd/html5/thumbnails/4.jpg)
• Vincent Vanhoucke (Google)
• Hugo Larochelle (Twitter)
• Andrew Ng (Baidu)
• Nando de Freitas (Oxford, Google DeepMind)
4
DEEP LEARNING COURSES
![Page 5: Deep learning and the systemic challenges of data science initiatives](https://reader031.fdocuments.in/reader031/viewer/2022030313/58a5eb1c1a28aba5728b4dcd/html5/thumbnails/5.jpg)
5
Your challenges are not technological but organizational
![Page 6: Deep learning and the systemic challenges of data science initiatives](https://reader031.fdocuments.in/reader031/viewer/2022030313/58a5eb1c1a28aba5728b4dcd/html5/thumbnails/6.jpg)
• Technology is disruptive
• The current organization of research is half broken and changing
• Misplaced incentives, interdisciplinarity, peer-reviewed publications, code vs papers, funding, reproducibility, questions around data-driven scientific method
• We are using few of the tools developed mainly in industry to manage disruptive innovation
6
WHY CHALLENGES ARE ORGANIZATIONAL?
![Page 7: Deep learning and the systemic challenges of data science initiatives](https://reader031.fdocuments.in/reader031/viewer/2022030313/58a5eb1c1a28aba5728b4dcd/html5/thumbnails/7.jpg)
Center for Data ScienceParis-Saclay
• Intro to deep learning
• The PS-CDS
• The data science ecosystem: challenges
• Some tools
7
OUTLINE
![Page 8: Deep learning and the systemic challenges of data science initiatives](https://reader031.fdocuments.in/reader031/viewer/2022030313/58a5eb1c1a28aba5728b4dcd/html5/thumbnails/8.jpg)
• You have a prediction or inference problem y = f(x)
• X: photo, spectrum, y: galaxy/star and redshift
• X: calorimetric image, y: particle parameters
• X: particle parameters, y: calorimetric image
8
DATA-DRIVEN INFERENCE
![Page 9: Deep learning and the systemic challenges of data science initiatives](https://reader031.fdocuments.in/reader031/viewer/2022030313/58a5eb1c1a28aba5728b4dcd/html5/thumbnails/9.jpg)
• You have a prediction or inference problem y = f(x)
• You have no model to fit, but a large set of (x, y) pairs
• The source is (typically) either
• observation + human labeling
• simulation
• And a loss function L(y, ypred)9
DATA-DRIVEN INFERENCE
![Page 10: Deep learning and the systemic challenges of data science initiatives](https://reader031.fdocuments.in/reader031/viewer/2022030313/58a5eb1c1a28aba5728b4dcd/html5/thumbnails/10.jpg)
• The solution
• Design/define a lot of application/domain-dependent cues/features hj(x), or use “simple” nonlinear functions (like trees, basis functions, filters).
• Learn a linear function f(x) = ∑j wj hj(x)
• shallow neural nets, ensemble methods, kernel methods
• Works well for most of the practical problems (but not all)
10
THE SHALLOW LEARNING PARADIGM
![Page 11: Deep learning and the systemic challenges of data science initiatives](https://reader031.fdocuments.in/reader031/viewer/2022030313/58a5eb1c1a28aba5728b4dcd/html5/thumbnails/11.jpg)
11
Your most important question is:
are you in the “not all” part?
![Page 12: Deep learning and the systemic challenges of data science initiatives](https://reader031.fdocuments.in/reader031/viewer/2022030313/58a5eb1c1a28aba5728b4dcd/html5/thumbnails/12.jpg)
• The solution
• Parametrize f(x) = f(x, w)
• w is very high-dimensional, f has a lot of capacity
• make everything quasi-differentiable (L and f(.,w))
• regularize (L1, L2, dropout, etc.)
• learn w using stochastic gradient descent
12
THE DEEP LEARNING PARADIGM
![Page 13: Deep learning and the systemic challenges of data science initiatives](https://reader031.fdocuments.in/reader031/viewer/2022030313/58a5eb1c1a28aba5728b4dcd/html5/thumbnails/13.jpg)
• From a design (user) point of view
• Instead of hand-crafting (families of) informative features, you will design a system of reusable blocks of differentiable functions
• Close to the data, domain knowledge is important
• Deeper layers are rather general
• A lot of partly reusable trial-end-error tricks
• Pre-trained and saved networks/blocks,
• “dark knowledge”
13
SHALLOW TO DEEP LEARNING
![Page 14: Deep learning and the systemic challenges of data science initiatives](https://reader031.fdocuments.in/reader031/viewer/2022030313/58a5eb1c1a28aba5728b4dcd/html5/thumbnails/14.jpg)
14
STATE OF THE ART
• Computer vision
• close to the data: convolutional layers, max pooling
• Sequential data (speech, language)
• recurrent nets, networks with memory (LSTM)
• Multi-modal embeddings (eg: caption generation)
• (Half) future: robotics, Turing machine, reasoning, neural simulators
![Page 15: Deep learning and the systemic challenges of data science initiatives](https://reader031.fdocuments.in/reader031/viewer/2022030313/58a5eb1c1a28aba5728b4dcd/html5/thumbnails/15.jpg)
• Tools, techniques
• deep learning libraries (Theano, TensorFlow, Caffe, Torch)
• automatic differentiation
• stochastic gradient descent
• hyperparameter optimization
• lots of data and machines (GPUs)
15
THE DEEP LEARNING PARADIGM
![Page 16: Deep learning and the systemic challenges of data science initiatives](https://reader031.fdocuments.in/reader031/viewer/2022030313/58a5eb1c1a28aba5728b4dcd/html5/thumbnails/16.jpg)
16
I will stop talking about science
Well, not really
![Page 17: Deep learning and the systemic challenges of data science initiatives](https://reader031.fdocuments.in/reader031/viewer/2022030313/58a5eb1c1a28aba5728b4dcd/html5/thumbnails/17.jpg)
17
I will talk about management
(of) (data) science
![Page 18: Deep learning and the systemic challenges of data science initiatives](https://reader031.fdocuments.in/reader031/viewer/2022030313/58a5eb1c1a28aba5728b4dcd/html5/thumbnails/18.jpg)
Center for Data ScienceParis-Saclay
• My eight-year of experience interfacing between high-energy physics and data science
• Our two-year experience of running PS-CDS
• Extensive collaboration with management scientist
18
WHERE DOES IT COME FROM?
![Page 19: Deep learning and the systemic challenges of data science initiatives](https://reader031.fdocuments.in/reader031/viewer/2022030313/58a5eb1c1a28aba5728b4dcd/html5/thumbnails/19.jpg)
Center for Data ScienceParis-Saclay
DATA SCIENCE IN THE WORLD
19
![Page 20: Deep learning and the systemic challenges of data science initiatives](https://reader031.fdocuments.in/reader031/viewer/2022030313/58a5eb1c1a28aba5728b4dcd/html5/thumbnails/20.jpg)
Center for Data ScienceParis-Saclay20
UNIVERSITÉ PARIS-SACLAY
19 founding partners
![Page 21: Deep learning and the systemic challenges of data science initiatives](https://reader031.fdocuments.in/reader031/viewer/2022030313/58a5eb1c1a28aba5728b4dcd/html5/thumbnails/21.jpg)
Center for Data ScienceParis-Saclay
UNIVERSITÉ PARIS-SACLAY
21
+ horizontal multi-disciplinary and multi-partner initiatives to create cohesion
![Page 22: Deep learning and the systemic challenges of data science initiatives](https://reader031.fdocuments.in/reader031/viewer/2022030313/58a5eb1c1a28aba5728b4dcd/html5/thumbnails/22.jpg)
Center for Data ScienceParis-Saclay22
Center for Data ScienceParis-Saclay
A multi-disciplinary initiative to define, structure, and manage the data science ecosystem at the Université Paris-Saclay
http://www.datascience-paris-saclay.fr/
Biology & bioinformaticsIBISC/UEvry LRI/UPSudHepatinovCESP/UPSud-UVSQ-Inserm IGM-I2BC/UPSud MIA/AgroMIAj-MIG/INRALMAS/Centrale
ChemistryEA4041/UPSud
Earth sciencesLATMOS/UVSQ GEOPS/UPSudIPSL/UVSQLSCE/UVSQLMD/Polytechnique
EconomyLM/ENSAE RITM/UPSudLFA/ENSAE
NeuroscienceUNICOG/InsermU1000/InsermNeuroSpin/CEA
Particle physics astrophysics & cosmologyLPP/Polytechnique DMPH/ONERACosmoStat/CEAIAS/UPSudAIM/CEALAL/UPSud
The Paris-Saclay Center for Data ScienceData Science for scientific Data
250 researchers in 35 laboratories
Machine learningLRI/UPSud LTCI/TelecomCMLA/Cachan LS/ENSAELIX/PolytechniqueMIA/AgroCMA/PolytechniqueLSS/SupélecCVN/Centrale LMAS/CentraleDTIM/ONERAIBISC/UEvry
VisualizationINRIALIMSI
Signal processingLTCI/TelecomCMA/PolytechniqueCVN/CentraleLSS/SupélecCMLA/CachanLIMSIDTIM/ONERA
StatisticsLMO/UPSud LS/ENSAELSS/SupélecCMA/PolytechniqueLMAS/CentraleMIA/AgroParisTech
Data sciencestatistics
machine learninginformation retrieval
signal processingdata visualization
databases
Domain sciencehuman society
life brain earth
universe
Tool buildingsoftware engineering
clouds/gridshigh-performance
computingoptimization
Data scientist
Applied scientist
Domain scientist
Data engineer
Software engineer
Center for Data ScienceParis-Saclay
datascience-paris-saclay.fr
@SaclayCDS
LIST/CEA
![Page 23: Deep learning and the systemic challenges of data science initiatives](https://reader031.fdocuments.in/reader031/viewer/2022030313/58a5eb1c1a28aba5728b4dcd/html5/thumbnails/23.jpg)
Center for Data ScienceParis-Saclay
DATA SCIENCE
23
Design of automated methods
to analyze massive and complex data
to extract useful information
![Page 24: Deep learning and the systemic challenges of data science initiatives](https://reader031.fdocuments.in/reader031/viewer/2022030313/58a5eb1c1a28aba5728b4dcd/html5/thumbnails/24.jpg)
Center for Data ScienceParis-Saclay24
CENTER FOR DATA SCIENCE=
DATA CENTER
We are focusing on inference:
data knowledge
Interfacing with HPC, cloud, storage, production, privacy, security
![Page 25: Deep learning and the systemic challenges of data science initiatives](https://reader031.fdocuments.in/reader031/viewer/2022030313/58a5eb1c1a28aba5728b4dcd/html5/thumbnails/25.jpg)
25
WHAT IS NEW?
“As the flow of data increases, it is increasingly processed, analyzed, and acted upon by machines, not
humans.”
NYU-CDS manifesto
![Page 26: Deep learning and the systemic challenges of data science initiatives](https://reader031.fdocuments.in/reader031/viewer/2022030313/58a5eb1c1a28aba5728b4dcd/html5/thumbnails/26.jpg)
• We have the data
• statistical / physical modeling is less important
• data-driven prediction
• We have the computational power
• We have the algorithms
• deep learning breakthrough: image, speech, language
• closing on AI, step by step
26
WHAT IS NEW?
![Page 27: Deep learning and the systemic challenges of data science initiatives](https://reader031.fdocuments.in/reader031/viewer/2022030313/58a5eb1c1a28aba5728b4dcd/html5/thumbnails/27.jpg)
Center for Data ScienceParis-Saclay27
https://medium.com/@balazskegl
![Page 28: Deep learning and the systemic challenges of data science initiatives](https://reader031.fdocuments.in/reader031/viewer/2022030313/58a5eb1c1a28aba5728b4dcd/html5/thumbnails/28.jpg)
Domain scienceenergy and physical sciences
health and life sciences Earth and environment
economy and society brain
28
THE DATA SCIENCE LANDSCAPEData scientist
Data trainer
Applied scientist
Domain scientistSoftware engineer
Data engineer
Data sciencestatistics
machine learning information retrieval
signal processing data visualization
databases
Tool building software engineering
clouds/grids high-performance
computing optimization
![Page 29: Deep learning and the systemic challenges of data science initiatives](https://reader031.fdocuments.in/reader031/viewer/2022030313/58a5eb1c1a28aba5728b4dcd/html5/thumbnails/29.jpg)
Center for Data ScienceParis-Saclay
• (The lack of) manpower
• especially at the interfaces
• industrial brain-drain
• Incentives
• data scientists are not incentivized to work on domain science
• scientists are not incentivized to work on tools
• Access
• no well-developed channels to identify the right experts for a given problem
• Tools
• few tools that can help domain scientists and data scientists to collaborate efficiently
29
CHALLENGES
![Page 30: Deep learning and the systemic challenges of data science initiatives](https://reader031.fdocuments.in/reader031/viewer/2022030313/58a5eb1c1a28aba5728b4dcd/html5/thumbnails/30.jpg)
Center for Data ScienceParis-Saclay
TOOLS
30
We are designing and learning to manage tools to
accompany data science projects with different needs
![Page 31: Deep learning and the systemic challenges of data science initiatives](https://reader031.fdocuments.in/reader031/viewer/2022030313/58a5eb1c1a28aba5728b4dcd/html5/thumbnails/31.jpg)
Center for Data ScienceParis-Saclay
TOOLS: LANDSCAPE TO ECOSYSTEM
31
Data scientist
Data trainer
Applied scientist
Domain expertSoftware engineer
Data engineer
Tool building Data domains
Data sciencestatistics
machine learning information retrieval
signal processing data visualization
databases
• interdisciplinary projects • matchmaking tool • design and innovation strategy workshops • data challenges
• coding sprints • Open Software Initiative • code consolidator and engineering projects
software engineeringclouds/grids
high-performancecomputing
optimization
energy and physical sciences health and life sciences Earth and environment
economy and society brain
• data science RAMPs and TSs • IT platform for linked data • annotation tools • SaaS data science platform
![Page 32: Deep learning and the systemic challenges of data science initiatives](https://reader031.fdocuments.in/reader031/viewer/2022030313/58a5eb1c1a28aba5728b4dcd/html5/thumbnails/32.jpg)
32
• Efficient exploration of the space of innovative ideas
• Communication, knowledge sharing
• Project building
DESIGNING DATA SCIENCE PROJECTS
![Page 33: Deep learning and the systemic challenges of data science initiatives](https://reader031.fdocuments.in/reader031/viewer/2022030313/58a5eb1c1a28aba5728b4dcd/html5/thumbnails/33.jpg)
Center for Data ScienceParis-Saclay
DESIGNING DATA SCIENCE PROJECTS
33
Data value
Data analytics
Exploration of value !
• design theory • data-based prospection • innovation workshops
Problem formulation Problem solving
!• specialized teams • RAMPs / training sprints • data challenges
![Page 34: Deep learning and the systemic challenges of data science initiatives](https://reader031.fdocuments.in/reader031/viewer/2022030313/58a5eb1c1a28aba5728b4dcd/html5/thumbnails/34.jpg)
Center for Data ScienceParis-Saclay
• Putting domain scientists, data scientists, and management scientist in the same room
• Getting them understand each other
• Keeping them collectively creative
• The goal: identifying and defining projects
• low-hanging fruits
• breakthrough projects
• long-term vision
34
DESIGN AND INNOVATION STRATEGY WORKSHOPS
![Page 35: Deep learning and the systemic challenges of data science initiatives](https://reader031.fdocuments.in/reader031/viewer/2022030313/58a5eb1c1a28aba5728b4dcd/html5/thumbnails/35.jpg)
Center for Data ScienceParis-Saclay
innovative design
=
interaction and joint expansion of concepts
and knowledge
35
DESIGN AND INNOVATION STRATEGY WORKSHOPS
C/K design theory
![Page 36: Deep learning and the systemic challenges of data science initiatives](https://reader031.fdocuments.in/reader031/viewer/2022030313/58a5eb1c1a28aba5728b4dcd/html5/thumbnails/36.jpg)
Center for Data ScienceParis-Saclay36
DESIGN AND INNOVATION STRATEGY WORKSHOPS
[P]$Project$building!Ini3alisa3on$
[K]$Knowledge$sharing$
Workshops$
[C]$IFM?Design$Workshops$
[RUN]!
DKCP process: linearizing C-K dynamics
![Page 37: Deep learning and the systemic challenges of data science initiatives](https://reader031.fdocuments.in/reader031/viewer/2022030313/58a5eb1c1a28aba5728b4dcd/html5/thumbnails/37.jpg)
Center for Data ScienceParis-Saclay
THREE ANALYTICS TOOLS FOR INITIATING DOMAIN-DATA SCIENCE INTERACTIONS
37
RAPID ANALYTICS AND MODEL PROTOTYPING
(RAMP)
DATA CHALLENGES
TRAINING SPRINTS (TS)
![Page 38: Deep learning and the systemic challenges of data science initiatives](https://reader031.fdocuments.in/reader031/viewer/2022030313/58a5eb1c1a28aba5728b4dcd/html5/thumbnails/38.jpg)
Center for Data ScienceParis-Saclay
DATA CHALLENGES
38
![Page 39: Deep learning and the systemic challenges of data science initiatives](https://reader031.fdocuments.in/reader031/viewer/2022030313/58a5eb1c1a28aba5728b4dcd/html5/thumbnails/39.jpg)
Center for Data ScienceParis-Saclay
DATA CHALLENGES
39
• A data challenge is a recently developed unconventional dissemination and communication tool
• a scientific or industrial data producer arrives with a well-defined problem and a corresponding annotated data set
• defines a quantitative goal
• makes the problem and part of the data set (the training set) public on a dedicated site
• data science experts then take the public training data and submit solutions (predictions) for a test set with hidden annotations
• submissions are evaluated numerically using the quantitative measure
• contestants are listed on a leaderboard
• after a predefined time, typically a couple of months, the final results are revealed and the winners are awarded
![Page 40: Deep learning and the systemic challenges of data science initiatives](https://reader031.fdocuments.in/reader031/viewer/2022030313/58a5eb1c1a28aba5728b4dcd/html5/thumbnails/40.jpg)
Center for Data ScienceParis-Saclay
DATA CHALLENGES
40
• The HiggsML challenge on Kaggle
• https://www.kaggle.com/c/higgs-boson
![Page 41: Deep learning and the systemic challenges of data science initiatives](https://reader031.fdocuments.in/reader031/viewer/2022030313/58a5eb1c1a28aba5728b4dcd/html5/thumbnails/41.jpg)
Center for Data ScienceParis-Saclay
HUGE PUBLICITY
41
B. Kégl / AppStat@LAL Learning to discover
CLASSIFICATION FOR DISCOVERY
14
![Page 42: Deep learning and the systemic challenges of data science initiatives](https://reader031.fdocuments.in/reader031/viewer/2022030313/58a5eb1c1a28aba5728b4dcd/html5/thumbnails/42.jpg)
Center for Data ScienceParis-Saclay
SIGNIFICANT IMPROVEMENT OVER THE BASELINE
42
B. Kégl / AppStat@LAL Learning to discover
CLASSIFICATION FOR DISCOVERY
15
![Page 43: Deep learning and the systemic challenges of data science initiatives](https://reader031.fdocuments.in/reader031/viewer/2022030313/58a5eb1c1a28aba5728b4dcd/html5/thumbnails/43.jpg)
43
HUGE PUBLICITY
SIGNIFICANT IMPROVEMENT OVER THE BASELINE
yet partially missing the objectives
![Page 44: Deep learning and the systemic challenges of data science initiatives](https://reader031.fdocuments.in/reader031/viewer/2022030313/58a5eb1c1a28aba5728b4dcd/html5/thumbnails/44.jpg)
• Challenges are useful for
• generating visibility in the data science community about novel application domains
• benchmarking in a fair way state-of-the-art techniques on well-defined problems
• finding talented data scientists
• Limitations
• not necessary adapted to solving complex and open-ended data science problems in realistic environments
• no direct access to solutions and data scientist
• emphasizes competition
44
DATA CHALLENGES
![Page 45: Deep learning and the systemic challenges of data science initiatives](https://reader031.fdocuments.in/reader031/viewer/2022030313/58a5eb1c1a28aba5728b4dcd/html5/thumbnails/45.jpg)
45
We decided to design something better
![Page 46: Deep learning and the systemic challenges of data science initiatives](https://reader031.fdocuments.in/reader031/viewer/2022030313/58a5eb1c1a28aba5728b4dcd/html5/thumbnails/46.jpg)
46
RAPID ANALYTICS AND MODEL PROTOTYPING (RAMP)
• Prototyping
• Training
• Collaboration building
![Page 47: Deep learning and the systemic challenges of data science initiatives](https://reader031.fdocuments.in/reader031/viewer/2022030313/58a5eb1c1a28aba5728b4dcd/html5/thumbnails/47.jpg)
Center for Data ScienceParis-Saclay47
RAMPS
• Single-day coding sessions
• 20-40 participants
• preparation is similar to challenges
• Goals
• focusing and motivating top talents
• promoting collaboration, speed, and efficiency
• solving (prototyping) real problems
![Page 48: Deep learning and the systemic challenges of data science initiatives](https://reader031.fdocuments.in/reader031/viewer/2022030313/58a5eb1c1a28aba5728b4dcd/html5/thumbnails/48.jpg)
48
TRAINING SPRINTS
• Single-day training sessions
• 20-40 participants
• focusing on a single subject (deep learning, model tuning, functional data, etc.)
• preparing RAMPs
![Page 49: Deep learning and the systemic challenges of data science initiatives](https://reader031.fdocuments.in/reader031/viewer/2022030313/58a5eb1c1a28aba5728b4dcd/html5/thumbnails/49.jpg)
49
ANALYTICS TOOLS TO PROMOTE COLLABORATION AND CODE REUSE
![Page 50: Deep learning and the systemic challenges of data science initiatives](https://reader031.fdocuments.in/reader031/viewer/2022030313/58a5eb1c1a28aba5728b4dcd/html5/thumbnails/50.jpg)
Center for Data ScienceParis-Saclay
SIGNIFICANT IMPROVEMENT OVER THE BASELINE
50
B. Kégl / AppStat@LAL Learning to discover
CLASSIFICATION FOR DISCOVERY
15
![Page 51: Deep learning and the systemic challenges of data science initiatives](https://reader031.fdocuments.in/reader031/viewer/2022030313/58a5eb1c1a28aba5728b4dcd/html5/thumbnails/51.jpg)
Center for Data ScienceParis-Saclay51
ANALYTICS TOOL TO PROMOTE COLLABORATION AND CODE REUSE
![Page 52: Deep learning and the systemic challenges of data science initiatives](https://reader031.fdocuments.in/reader031/viewer/2022030313/58a5eb1c1a28aba5728b4dcd/html5/thumbnails/52.jpg)
Center for Data ScienceParis-Saclay
ANALYTICS TOOLS TO MONITOR PROGRESS
52
![Page 53: Deep learning and the systemic challenges of data science initiatives](https://reader031.fdocuments.in/reader031/viewer/2022030313/58a5eb1c1a28aba5728b4dcd/html5/thumbnails/53.jpg)
Center for Data ScienceParis-Saclay
RAPID ANALYTICS AND MODEL PROTOTYPING
2015 Jan 15 The HiggsML challenge
53
![Page 54: Deep learning and the systemic challenges of data science initiatives](https://reader031.fdocuments.in/reader031/viewer/2022030313/58a5eb1c1a28aba5728b4dcd/html5/thumbnails/54.jpg)
Center for Data ScienceParis-Saclay
RAPID ANALYTICS AND MODEL PROTOTYPING
2015 Apr 10 Classifying variable stars
54
![Page 55: Deep learning and the systemic challenges of data science initiatives](https://reader031.fdocuments.in/reader031/viewer/2022030313/58a5eb1c1a28aba5728b4dcd/html5/thumbnails/55.jpg)
Center for Data ScienceParis-Saclay
VARIABLE STARS
55
![Page 56: Deep learning and the systemic challenges of data science initiatives](https://reader031.fdocuments.in/reader031/viewer/2022030313/58a5eb1c1a28aba5728b4dcd/html5/thumbnails/56.jpg)
Learning to discoverB. Kégl / CNRS - Saclay
VARIABLE STARS
56
accuracy improvement: 89% to 96%
![Page 57: Deep learning and the systemic challenges of data science initiatives](https://reader031.fdocuments.in/reader031/viewer/2022030313/58a5eb1c1a28aba5728b4dcd/html5/thumbnails/57.jpg)
Center for Data ScienceParis-Saclay
RAPID ANALYTICS AND MODEL PROTOTYPING
2015 June 16 and Sept 26 Predicting El Nino
57
![Page 58: Deep learning and the systemic challenges of data science initiatives](https://reader031.fdocuments.in/reader031/viewer/2022030313/58a5eb1c1a28aba5728b4dcd/html5/thumbnails/58.jpg)
58
RAPID ANALYTICS AND MODEL PROTOTYPING
RMSE improvement: 0.9˚C to 0.4˚C
![Page 59: Deep learning and the systemic challenges of data science initiatives](https://reader031.fdocuments.in/reader031/viewer/2022030313/58a5eb1c1a28aba5728b4dcd/html5/thumbnails/59.jpg)
59
2015 October 8 Insect classification
RAPID ANALYTICS AND MODEL PROTOTYPING
![Page 60: Deep learning and the systemic challenges of data science initiatives](https://reader031.fdocuments.in/reader031/viewer/2022030313/58a5eb1c1a28aba5728b4dcd/html5/thumbnails/60.jpg)
60
RAPID ANALYTICS AND MODEL PROTOTYPING
accuracy improvement: 30% to 70%
![Page 61: Deep learning and the systemic challenges of data science initiatives](https://reader031.fdocuments.in/reader031/viewer/2022030313/58a5eb1c1a28aba5728b4dcd/html5/thumbnails/61.jpg)
61
2015 Fall Drug identification from spectra
RAPID ANALYTICS AND MODEL PROTOTYPING
![Page 62: Deep learning and the systemic challenges of data science initiatives](https://reader031.fdocuments.in/reader031/viewer/2022030313/58a5eb1c1a28aba5728b4dcd/html5/thumbnails/62.jpg)
• Teaching support
• Networking and HR support
• Support for collaborative team work
62
THE RAMP TOOL
A prototyping tool for collaborative development of data science workflows
![Page 63: Deep learning and the systemic challenges of data science initiatives](https://reader031.fdocuments.in/reader031/viewer/2022030313/58a5eb1c1a28aba5728b4dcd/html5/thumbnails/63.jpg)
Center for Data ScienceParis-Saclay63
THANK YOU!
![Page 64: Deep learning and the systemic challenges of data science initiatives](https://reader031.fdocuments.in/reader031/viewer/2022030313/58a5eb1c1a28aba5728b4dcd/html5/thumbnails/64.jpg)
Center for Data ScienceParis-Saclay
• A window to open data at Paris-Saclay
• We are not storing or handling existing large data sets
• Rather indexing, linking, and mapping, embedding in the worldwide linked data (RDF) ecosystem
• Storing small data sets of small teams is possible
• Subsets of large sets for prototyping
• Or simply store metadata plus pointer
64
IT PLATFORM FOR LINKED DATA
http://io.datascience-paris-saclay.fr/
![Page 65: Deep learning and the systemic challenges of data science initiatives](https://reader031.fdocuments.in/reader031/viewer/2022030313/58a5eb1c1a28aba5728b4dcd/html5/thumbnails/65.jpg)
Center for Data ScienceParis-Saclay
IT PLATFORM FOR LINKED DATA
65