Post on 01-Jul-2018
1/14 Big Data
© P. DELORT Mines ParisTech, Centre de Recherche en Informatique.
Pierre Delort
President of the French National Association of CIOs Association Nationale des DSI.
OECD ICCP Technology Foresight Forum
"Harnessing data as a new source of growth: Big data analytics and policies"
2/14 Big Data
© P. DELORT Mines ParisTech, Centre de Recherche en Informatique.
Research and the Internet
www.serimedis.inserm.fr
3/14 Big Data
© P. DELORT Mines ParisTech, Centre de Recherche en Informatique.
What you have typed
Oscar nomination
Nature Vol 457|19 February 2009
4/14 Big Data
© P. DELORT Mines ParisTech, Centre de Recherche en Informatique.
The 45 queries and their topics
logit(I(t)) = αlogit(Q(t)) + ε
5/14 Big Data
© P. DELORT Mines ParisTech, Centre de Recherche en Informatique.
US, Region atlantic center, season 2007-08
Google Flu Trends (black) CDC (red)
Results
Etats-Unis – Propagation du Virus
Estimation de la grippe
Google Flu Trends
CDC’s Sentinel
Induction—“the glory of
science and the scandal of philosophy
CD Broad
6/14 Big Data
© P. DELORT Mines ParisTech, Centre de Recherche en Informatique.
NGS & Moore’s law
Ion proton sequencer
Bottleneck = Data Analysis
Cost per Mégabase 10 years ; cost divided by 10 000
About 100 time Moore’s law
7/14 Big Data
© P. DELORT Mines ParisTech, Centre de Recherche en Informatique.
Important (X 3 to X 10) increase of the need of :
Sboner A, et al.: The real cost of sequencing:
higher than you think! Genome Biology 2011, 12:125.
• (bio) computer scientists ;
• (bio) statisticians.
NGS ; impact on skills
0% 10% 20% 30% 40% 50% 60% 70% 80% 90%
100%
2000 (before NGS) 2010 (NGS) 2020 (Est.) % o
f s
eq
ue
nc
ing
co
st
Genome's sequencing cost
Sampling & Experiment design
Sequencing
Data Management
Data reduction & synthesis
Downstream analysis
If you are looking for a career where your
services will be in high demand, you should
find something where you provide a scarce,
complementary service to something that is
getting ubiquitous and cheap. So what’s
getting ubiquitous and cheap? Data. And
what is complementary to data? Analysis. –
Prof. Hal Varian, UC Berkeley, Chief
Economist at Google, 2008.
8/14 Big Data
© P. DELORT Mines ParisTech, Centre de Recherche en Informatique.
The last ten years
This leads to three technologies I belive will drive the future of Big Data computing :
• In-Memory ;
• SSD ;
• MPP.
Sharp decrease of Solid State Device’s cost Innovation in
DB & software
Increase of
computing power
64 bits adressable memory
(DRAM) X 4,2 109
DRAM/102
Flash NAND/103
/50 à /10x
X 100
+
+
9/14 Big Data
© P. DELORT Mines ParisTech, Centre de Recherche en Informatique.
Three technologies
• In-Memory ; very fast (I/O), saves on Opex (DB tuning, energy…) but non-
persistent, requires reformatting of the software, expensive and (today) with
limited scalability ;
• SSD (Flash/NAND) ; fast (I/O) no reformating required, persistent, scalable,
extensible, cheap ; saves both on Capex and Opex (energy) ;
• MPP (MapReduce) ; very scalable (n 105), fit for « low density data »
(key/value), but requires new programming skills and Opex (energy).
Source : Objective Analysis (Echelle Log.)
DRAM
Flash Disk
Tape
CPU Cache
Bandwidth (MB/S)
Cost
10/14 Big Data
© P. DELORT Mines ParisTech, Centre de Recherche en Informatique.
58 Harvard Business Review November 2010
E-science ; the 4th stage of science
11/14 Big Data
© P. DELORT Mines ParisTech, Centre de Recherche en Informatique.
Inspiration
EXPÉRIMENTATION (2nd Paradigm)
Publication Protection Valorisation
Conversion Selection Diffusion
(BIG) DATA MINING (4th Paradigm)
Research
Firm / Innovation
Data
Hypothesis
Limiting factor
Validation Publication Protection Valorisation
Exploration Explanation
Idea generation
Impacts on Research & Innovation Induction—“the glory of
science and the scandal
of philosophy
CD Broad
12/14 Big Data
© P. DELORT Mines ParisTech, Centre de Recherche en Informatique.
Public Data; a right for citizens as well as scientists ?
data.gouv.fr
13/14 Big Data
© P. DELORT Mines ParisTech, Centre de Recherche en Informatique.
Will the CIO :
1.be in charge of the (rich, up to date, reliable…) data whatever their origin (in Business) ;
• internal ;
• external ;
• public ;
• private ; available for acquisition or ;
• depending of the action of a regulator.
1.and act as a Chief Data Officer, in charge of « Data Informed Decisions* »?
*Kolker E., 2012
CDO, a new rôle for the (business) CIO ?
“I keep saying that the
sexy job in the next 10
years will be
statisticians,” said Hal
Varian, chief economist
at Google. “And I’m
not kidding.”
14/14 Big Data
© P. DELORT Mines ParisTech, Centre de Recherche en Informatique.
Wisdom
Knowledge
Information
Data
pdt@andsi.fr - pierre.delort@cri.mines-paristech.fr
Where is the wisdom we have lost in knowledge ?
Where is the knowledge we have lost in information ?
T.S. Eliot
Lee Scott, Wal-Mart
CEO to the question
« What happened to
K-Mart ? His answer,
in a word, was
“coupons”. K-Mart
overdid it with
coupons, which
became too much a
hunk of its overhead,
while also narrowing
his customer-base
toward coupon
clippers. By contrast,
Wal-Mart minimized
that kind of thing,
focusing instead on
promising “everyday
low price”.
In the intention
Economy, Doc Searle,
2012.