Big Data, Data Science, Statistics - University of Toronto Data, Data Science, Statistics Nancy Reid...
-
Upload
truongmien -
Category
Documents
-
view
237 -
download
6
Transcript of Big Data, Data Science, Statistics - University of Toronto Data, Data Science, Statistics Nancy Reid...
![Page 1: Big Data, Data Science, Statistics - University of Toronto Data, Data Science, Statistics Nancy Reid 31 March 2017 Queen's University 31 March 2017 Big Data = Big Machines = Lots of](https://reader034.fdocuments.in/reader034/viewer/2022042708/5ad2f6307f8b9a482c8d075a/html5/thumbnails/1.jpg)
Big Data, Data Science, Statistics
Nancy Reid
31 March 2017
Queen's University 31 March 2017
![Page 2: Big Data, Data Science, Statistics - University of Toronto Data, Data Science, Statistics Nancy Reid 31 March 2017 Queen's University 31 March 2017 Big Data = Big Machines = Lots of](https://reader034.fdocuments.in/reader034/viewer/2022042708/5ad2f6307f8b9a482c8d075a/html5/thumbnails/2.jpg)
Big Data
= Big Machines
= Lots of Computing
= Complex Architectures
= Computer Science
2Queen's University 31 March 2017
![Page 3: Big Data, Data Science, Statistics - University of Toronto Data, Data Science, Statistics Nancy Reid 31 March 2017 Queen's University 31 March 2017 Big Data = Big Machines = Lots of](https://reader034.fdocuments.in/reader034/viewer/2022042708/5ad2f6307f8b9a482c8d075a/html5/thumbnails/3.jpg)
Small data
= equations and
formulas
= mathematical
modelling
= a little computing
= Statistical Science
3Queen's University 31 March 2017
![Page 4: Big Data, Data Science, Statistics - University of Toronto Data, Data Science, Statistics Nancy Reid 31 March 2017 Queen's University 31 March 2017 Big Data = Big Machines = Lots of](https://reader034.fdocuments.in/reader034/viewer/2022042708/5ad2f6307f8b9a482c8d075a/html5/thumbnails/4.jpg)
Big Data
• Interesting
• Detailed
• Informative
• Fun
4Queen's University 31 March 2017
![Page 5: Big Data, Data Science, Statistics - University of Toronto Data, Data Science, Statistics Nancy Reid 31 March 2017 Queen's University 31 March 2017 Big Data = Big Machines = Lots of](https://reader034.fdocuments.in/reader034/viewer/2022042708/5ad2f6307f8b9a482c8d075a/html5/thumbnails/5.jpg)
Small Data
5
So yesterday
Queen's University 31 March 2017
![Page 6: Big Data, Data Science, Statistics - University of Toronto Data, Data Science, Statistics Nancy Reid 31 March 2017 Queen's University 31 March 2017 Big Data = Big Machines = Lots of](https://reader034.fdocuments.in/reader034/viewer/2022042708/5ad2f6307f8b9a482c8d075a/html5/thumbnails/6.jpg)
Small Data
6Queen's University 31 March 2017
![Page 7: Big Data, Data Science, Statistics - University of Toronto Data, Data Science, Statistics Nancy Reid 31 March 2017 Queen's University 31 March 2017 Big Data = Big Machines = Lots of](https://reader034.fdocuments.in/reader034/viewer/2022042708/5ad2f6307f8b9a482c8d075a/html5/thumbnails/7.jpg)
7Queen's University 31 March 2017
![Page 8: Big Data, Data Science, Statistics - University of Toronto Data, Data Science, Statistics Nancy Reid 31 March 2017 Queen's University 31 March 2017 Big Data = Big Machines = Lots of](https://reader034.fdocuments.in/reader034/viewer/2022042708/5ad2f6307f8b9a482c8d075a/html5/thumbnails/8.jpg)
Canadian Institute for Statistical Sciences
Pacific Institute
for
Mathematical
Sciences
Centre de Recherches Mathématiques
Fields Institute
for Resesarch
in the
Mathematical
Sciences
8Queen's University 31 March 2017
![Page 9: Big Data, Data Science, Statistics - University of Toronto Data, Data Science, Statistics Nancy Reid 31 March 2017 Queen's University 31 March 2017 Big Data = Big Machines = Lots of](https://reader034.fdocuments.in/reader034/viewer/2022042708/5ad2f6307f8b9a482c8d075a/html5/thumbnails/9.jpg)
Workshops
• Opening Conference and Bootcamp
• Statistical Machine Learning
• Optimization and Matrix Methods
• Visualization: Strategies and Principles
• Big Data in Health Policy
• Big Data for Social Policy
9Queen's University 31 March 2017
![Page 10: Big Data, Data Science, Statistics - University of Toronto Data, Data Science, Statistics Nancy Reid 31 March 2017 Queen's University 31 March 2017 Big Data = Big Machines = Lots of](https://reader034.fdocuments.in/reader034/viewer/2022042708/5ad2f6307f8b9a482c8d075a/html5/thumbnails/10.jpg)
Opening Conference and Bootcamp
Introduction to topics at following workshops
One day on each topic
Many speakers started by trying to define big data
“I shall not today attempt further to define the kinds of
material I understand to be embraced within that
shorthand description, and perhaps I could never
succeed in intelligibly doing so.
But I know it when I see it … ”
Justice Potter Stewart; Jacobellis v. Ohio 22 June 1964
Robert Bell, Google, Plenary Opening Lecture
10Queen's University 31 March 2017
![Page 11: Big Data, Data Science, Statistics - University of Toronto Data, Data Science, Statistics Nancy Reid 31 March 2017 Queen's University 31 March 2017 Big Data = Big Machines = Lots of](https://reader034.fdocuments.in/reader034/viewer/2022042708/5ad2f6307f8b9a482c8d075a/html5/thumbnails/11.jpg)
Queen's University 31 March 2017 11
![Page 12: Big Data, Data Science, Statistics - University of Toronto Data, Data Science, Statistics Nancy Reid 31 March 2017 Queen's University 31 March 2017 Big Data = Big Machines = Lots of](https://reader034.fdocuments.in/reader034/viewer/2022042708/5ad2f6307f8b9a482c8d075a/html5/thumbnails/12.jpg)
Some highlights
12Queen's University 31 March 2017
![Page 13: Big Data, Data Science, Statistics - University of Toronto Data, Data Science, Statistics Nancy Reid 31 March 2017 Queen's University 31 March 2017 Big Data = Big Machines = Lots of](https://reader034.fdocuments.in/reader034/viewer/2022042708/5ad2f6307f8b9a482c8d075a/html5/thumbnails/13.jpg)
Some highlights
13Queen's University 31 March 2017
![Page 14: Big Data, Data Science, Statistics - University of Toronto Data, Data Science, Statistics Nancy Reid 31 March 2017 Queen's University 31 March 2017 Big Data = Big Machines = Lots of](https://reader034.fdocuments.in/reader034/viewer/2022042708/5ad2f6307f8b9a482c8d075a/html5/thumbnails/14.jpg)
14Queen's University 31 March 2017
![Page 15: Big Data, Data Science, Statistics - University of Toronto Data, Data Science, Statistics Nancy Reid 31 March 2017 Queen's University 31 March 2017 Big Data = Big Machines = Lots of](https://reader034.fdocuments.in/reader034/viewer/2022042708/5ad2f6307f8b9a482c8d075a/html5/thumbnails/15.jpg)
15
• natural gradient ascent
• uses Fisher information as metric tensor
Girolami and Calderhead (2011); Amari (1987); Rao (1945)
• Gaussian graphical model approximation to force
sparse inverse
Grosse and Salakhutdinov (2016) 32nd Internat. Conf. on Machine Learning
Queen's University 31 March 2017
![Page 16: Big Data, Data Science, Statistics - University of Toronto Data, Data Science, Statistics Nancy Reid 31 March 2017 Queen's University 31 March 2017 Big Data = Big Machines = Lots of](https://reader034.fdocuments.in/reader034/viewer/2022042708/5ad2f6307f8b9a482c8d075a/html5/thumbnails/16.jpg)
Queen's University 31 March 2017 16
• if just one binary top node, model for
is a logistic regression
• with several binary top nodes, model for
is also a logistic regression, with odds ratio depending
only on
• deep learning has ~10 layers, with millions of units
in each layer
• estimating parameters is an optimization problem
![Page 17: Big Data, Data Science, Statistics - University of Toronto Data, Data Science, Statistics Nancy Reid 31 March 2017 Queen's University 31 March 2017 Big Data = Big Machines = Lots of](https://reader034.fdocuments.in/reader034/viewer/2022042708/5ad2f6307f8b9a482c8d075a/html5/thumbnails/17.jpg)
Queen's University 31 March 2017 17
Leung et al Bioinformatics 2014
Brendan Frey, Infinite Genomes Project
FieldsLive January 27 2015
![Page 18: Big Data, Data Science, Statistics - University of Toronto Data, Data Science, Statistics Nancy Reid 31 March 2017 Queen's University 31 March 2017 Big Data = Big Machines = Lots of](https://reader034.fdocuments.in/reader034/viewer/2022042708/5ad2f6307f8b9a482c8d075a/html5/thumbnails/18.jpg)
Some highlights
18Queen's University 31 March 2017
![Page 19: Big Data, Data Science, Statistics - University of Toronto Data, Data Science, Statistics Nancy Reid 31 March 2017 Queen's University 31 March 2017 Big Data = Big Machines = Lots of](https://reader034.fdocuments.in/reader034/viewer/2022042708/5ad2f6307f8b9a482c8d075a/html5/thumbnails/19.jpg)
Some highlights
19Queen's University 31 March 2017
![Page 20: Big Data, Data Science, Statistics - University of Toronto Data, Data Science, Statistics Nancy Reid 31 March 2017 Queen's University 31 March 2017 Big Data = Big Machines = Lots of](https://reader034.fdocuments.in/reader034/viewer/2022042708/5ad2f6307f8b9a482c8d075a/html5/thumbnails/20.jpg)
20
• lasso penalty
• is convex relaxation of
• many interesting penalties are non-convex
• optimization routines may not find global optimum
Queen's University 31 March 2017
![Page 21: Big Data, Data Science, Statistics - University of Toronto Data, Data Science, Statistics Nancy Reid 31 March 2017 Queen's University 31 March 2017 Big Data = Big Machines = Lots of](https://reader034.fdocuments.in/reader034/viewer/2022042708/5ad2f6307f8b9a482c8d075a/html5/thumbnails/21.jpg)
21
• statistical error neighbourhood of true value
• approximation error iterating over t
Wainwright FieldsLive Jan 16 2015
Loh and Wainwright JMLR 2015
Queen's University 31 March 2017
![Page 22: Big Data, Data Science, Statistics - University of Toronto Data, Data Science, Statistics Nancy Reid 31 March 2017 Queen's University 31 March 2017 Big Data = Big Machines = Lots of](https://reader034.fdocuments.in/reader034/viewer/2022042708/5ad2f6307f8b9a482c8d075a/html5/thumbnails/22.jpg)
Some highlights
22Queen's University 31 March 2017
![Page 23: Big Data, Data Science, Statistics - University of Toronto Data, Data Science, Statistics Nancy Reid 31 March 2017 Queen's University 31 March 2017 Big Data = Big Machines = Lots of](https://reader034.fdocuments.in/reader034/viewer/2022042708/5ad2f6307f8b9a482c8d075a/html5/thumbnails/23.jpg)
Some highlights
23Queen's University 31 March 2017
innovis.cpsc.ucalgary.ca
![Page 24: Big Data, Data Science, Statistics - University of Toronto Data, Data Science, Statistics Nancy Reid 31 March 2017 Queen's University 31 March 2017 Big Data = Big Machines = Lots of](https://reader034.fdocuments.in/reader034/viewer/2022042708/5ad2f6307f8b9a482c8d075a/html5/thumbnails/24.jpg)
Visualization
• statistical graphics
– data representation
– data exploration
– filtering, sampling aggregation
• information visualization
• scientific visualization
• cognitive science and design
24Queen's University 31 March 2017
![Page 25: Big Data, Data Science, Statistics - University of Toronto Data, Data Science, Statistics Nancy Reid 31 March 2017 Queen's University 31 March 2017 Big Data = Big Machines = Lots of](https://reader034.fdocuments.in/reader034/viewer/2022042708/5ad2f6307f8b9a482c8d075a/html5/thumbnails/25.jpg)
Visualization
25
KPMG Data Observatory, IC
Queen's University 31 March 2017
![Page 26: Big Data, Data Science, Statistics - University of Toronto Data, Data Science, Statistics Nancy Reid 31 March 2017 Queen's University 31 March 2017 Big Data = Big Machines = Lots of](https://reader034.fdocuments.in/reader034/viewer/2022042708/5ad2f6307f8b9a482c8d075a/html5/thumbnails/26.jpg)
Visualization
26
KPMG Data Observatory, IC
Queen's University 31 March 2017
![Page 27: Big Data, Data Science, Statistics - University of Toronto Data, Data Science, Statistics Nancy Reid 31 March 2017 Queen's University 31 March 2017 Big Data = Big Machines = Lots of](https://reader034.fdocuments.in/reader034/viewer/2022042708/5ad2f6307f8b9a482c8d075a/html5/thumbnails/27.jpg)
Visualization
27
fivethirtyeight.com
Queen's University 31 March 2017
![Page 28: Big Data, Data Science, Statistics - University of Toronto Data, Data Science, Statistics Nancy Reid 31 March 2017 Queen's University 31 March 2017 Big Data = Big Machines = Lots of](https://reader034.fdocuments.in/reader034/viewer/2022042708/5ad2f6307f8b9a482c8d075a/html5/thumbnails/28.jpg)
Visualization
28
fivethirtyeight.com
Queen's University 31 March 2017
![Page 29: Big Data, Data Science, Statistics - University of Toronto Data, Data Science, Statistics Nancy Reid 31 March 2017 Queen's University 31 March 2017 Big Data = Big Machines = Lots of](https://reader034.fdocuments.in/reader034/viewer/2022042708/5ad2f6307f8b9a482c8d075a/html5/thumbnails/29.jpg)
Visualization
29
fivethirtyeight.com
Queen's University 31 March 2017
![Page 30: Big Data, Data Science, Statistics - University of Toronto Data, Data Science, Statistics Nancy Reid 31 March 2017 Queen's University 31 March 2017 Big Data = Big Machines = Lots of](https://reader034.fdocuments.in/reader034/viewer/2022042708/5ad2f6307f8b9a482c8d075a/html5/thumbnails/30.jpg)
Visualization
30
fivethirtyeight.com
Queen's University 31 March 2017
guns
![Page 31: Big Data, Data Science, Statistics - University of Toronto Data, Data Science, Statistics Nancy Reid 31 March 2017 Queen's University 31 March 2017 Big Data = Big Machines = Lots of](https://reader034.fdocuments.in/reader034/viewer/2022042708/5ad2f6307f8b9a482c8d075a/html5/thumbnails/31.jpg)
Some highlights
31Queen's University 31 March 2017
![Page 32: Big Data, Data Science, Statistics - University of Toronto Data, Data Science, Statistics Nancy Reid 31 March 2017 Queen's University 31 March 2017 Big Data = Big Machines = Lots of](https://reader034.fdocuments.in/reader034/viewer/2022042708/5ad2f6307f8b9a482c8d075a/html5/thumbnails/32.jpg)
Some highlights
32Queen's University 31 March 2017
![Page 33: Big Data, Data Science, Statistics - University of Toronto Data, Data Science, Statistics Nancy Reid 31 March 2017 Queen's University 31 March 2017 Big Data = Big Machines = Lots of](https://reader034.fdocuments.in/reader034/viewer/2022042708/5ad2f6307f8b9a482c8d075a/html5/thumbnails/33.jpg)
Health Policy Administrative Databases
33
Institute for Clinical and Evaluative Sciences
Queen's University 31 March 2017
![Page 34: Big Data, Data Science, Statistics - University of Toronto Data, Data Science, Statistics Nancy Reid 31 March 2017 Queen's University 31 March 2017 Big Data = Big Machines = Lots of](https://reader034.fdocuments.in/reader034/viewer/2022042708/5ad2f6307f8b9a482c8d075a/html5/thumbnails/34.jpg)
Health Policy Administrative Databases
34
Institute for Clinical and Evaluative Sciences
Queen's University 31 March 2017
![Page 35: Big Data, Data Science, Statistics - University of Toronto Data, Data Science, Statistics Nancy Reid 31 March 2017 Queen's University 31 March 2017 Big Data = Big Machines = Lots of](https://reader034.fdocuments.in/reader034/viewer/2022042708/5ad2f6307f8b9a482c8d075a/html5/thumbnails/35.jpg)
Thérèse Stukel, ICES
35Queen's University 31 March 2017
![Page 36: Big Data, Data Science, Statistics - University of Toronto Data, Data Science, Statistics Nancy Reid 31 March 2017 Queen's University 31 March 2017 Big Data = Big Machines = Lots of](https://reader034.fdocuments.in/reader034/viewer/2022042708/5ad2f6307f8b9a482c8d075a/html5/thumbnails/36.jpg)
Some highlights
36Queen's University 31 March 2017
![Page 37: Big Data, Data Science, Statistics - University of Toronto Data, Data Science, Statistics Nancy Reid 31 March 2017 Queen's University 31 March 2017 Big Data = Big Machines = Lots of](https://reader034.fdocuments.in/reader034/viewer/2022042708/5ad2f6307f8b9a482c8d075a/html5/thumbnails/37.jpg)
Some highlights
37Queen's University 31 March 2017
![Page 38: Big Data, Data Science, Statistics - University of Toronto Data, Data Science, Statistics Nancy Reid 31 March 2017 Queen's University 31 March 2017 Big Data = Big Machines = Lots of](https://reader034.fdocuments.in/reader034/viewer/2022042708/5ad2f6307f8b9a482c8d075a/html5/thumbnails/38.jpg)
Thérèse Stukel, ICES
38Queen's University 31 March 2017
![Page 39: Big Data, Data Science, Statistics - University of Toronto Data, Data Science, Statistics Nancy Reid 31 March 2017 Queen's University 31 March 2017 Big Data = Big Machines = Lots of](https://reader034.fdocuments.in/reader034/viewer/2022042708/5ad2f6307f8b9a482c8d075a/html5/thumbnails/39.jpg)
Privacy
• “Big Data and Innovation, Setting the Record Straight:
De-identification Does Work”Privacy Commissioner of Ontario, July 2014
• “No silver bullet: De-identification still doesn’t work”Narayan & Felten, July 2014
• Statistical Disclosure Limitation
• Differential Privacy
• Multi-party Communication
39Queen's University 31 March 2017
![Page 40: Big Data, Data Science, Statistics - University of Toronto Data, Data Science, Statistics Nancy Reid 31 March 2017 Queen's University 31 March 2017 Big Data = Big Machines = Lots of](https://reader034.fdocuments.in/reader034/viewer/2022042708/5ad2f6307f8b9a482c8d075a/html5/thumbnails/40.jpg)
Some highlights
40Queen's University 31 March 2017
![Page 41: Big Data, Data Science, Statistics - University of Toronto Data, Data Science, Statistics Nancy Reid 31 March 2017 Queen's University 31 March 2017 Big Data = Big Machines = Lots of](https://reader034.fdocuments.in/reader034/viewer/2022042708/5ad2f6307f8b9a482c8d075a/html5/thumbnails/41.jpg)
What did we learn?
• Statistical models for big data are complex,
high-dimensional– inference is well-studied, but difficult
• Computational challenges include size and speed– ideas of statistical inference get lost in the machine
• Data owners understand 2., but not 1.
• Data modellers understand 1., but not 2.
• Data science may be the best way to combine these
41Queen's University 31 March 2017
![Page 42: Big Data, Data Science, Statistics - University of Toronto Data, Data Science, Statistics Nancy Reid 31 March 2017 Queen's University 31 March 2017 Big Data = Big Machines = Lots of](https://reader034.fdocuments.in/reader034/viewer/2022042708/5ad2f6307f8b9a482c8d075a/html5/thumbnails/42.jpg)
That was yesterday
• Data science programs “springing up like mushrooms
after rain”
• Berkeley, Hopkins, CMU, Washington, UBC, Toronto, …
42Queen's University 31 March 2017
![Page 43: Big Data, Data Science, Statistics - University of Toronto Data, Data Science, Statistics Nancy Reid 31 March 2017 Queen's University 31 March 2017 Big Data = Big Machines = Lots of](https://reader034.fdocuments.in/reader034/viewer/2022042708/5ad2f6307f8b9a482c8d075a/html5/thumbnails/43.jpg)
What is data science?
• a course?
• a set of courses?
• a job?
• a technology?
• a new field of research?
• a collaboration?
43Queen's University 31 March 2017
![Page 44: Big Data, Data Science, Statistics - University of Toronto Data, Data Science, Statistics Nancy Reid 31 March 2017 Queen's University 31 March 2017 Big Data = Big Machines = Lots of](https://reader034.fdocuments.in/reader034/viewer/2022042708/5ad2f6307f8b9a482c8d075a/html5/thumbnails/44.jpg)
Data Science Program(s)
• mathematical reasoning
• statistical theory
• statistical and machine learning methods
• programming and software development
• algorithms and data structure
• communication results and limitations
44Queen's University 31 March 2017
![Page 45: Big Data, Data Science, Statistics - University of Toronto Data, Data Science, Statistics Nancy Reid 31 March 2017 Queen's University 31 March 2017 Big Data = Big Machines = Lots of](https://reader034.fdocuments.in/reader034/viewer/2022042708/5ad2f6307f8b9a482c8d075a/html5/thumbnails/45.jpg)
45
http://arxiv.org/abs/1609.00037v1
Queen's University 31 March 2017
![Page 46: Big Data, Data Science, Statistics - University of Toronto Data, Data Science, Statistics Nancy Reid 31 March 2017 Queen's University 31 March 2017 Big Data = Big Machines = Lots of](https://reader034.fdocuments.in/reader034/viewer/2022042708/5ad2f6307f8b9a482c8d075a/html5/thumbnails/46.jpg)
… Good Enough
• Data Management – from raw to ‘analysable’
knitr
• Software – programming
tidyr
• Collaboration
dplyr
• Project Organization
ggplot2
• Keeping Track
ggvis
• Writing Github
46Queen's University 31 March 2017
![Page 47: Big Data, Data Science, Statistics - University of Toronto Data, Data Science, Statistics Nancy Reid 31 March 2017 Queen's University 31 March 2017 Big Data = Big Machines = Lots of](https://reader034.fdocuments.in/reader034/viewer/2022042708/5ad2f6307f8b9a482c8d075a/html5/thumbnails/47.jpg)
Data Science Research
• data collection and data quality
• large N, small p – computational strategies, e.g. Spark, Hadoop
– divide and conquer
• small n, large p – inferential and computational strategies
– dimension reduction
– post-selection inference
– inference for extremes
• ‘new’ types of data: networks, graphs, text, images, …– “alternative sources”
47Queen's University 31 March 2017
![Page 48: Big Data, Data Science, Statistics - University of Toronto Data, Data Science, Statistics Nancy Reid 31 March 2017 Queen's University 31 March 2017 Big Data = Big Machines = Lots of](https://reader034.fdocuments.in/reader034/viewer/2022042708/5ad2f6307f8b9a482c8d075a/html5/thumbnails/48.jpg)
… Data Science Research
• collaboration and communication
• data wrangling, database development, record linkage,
privacy
• replicability, reproducibility, new workflows
• visualization
• outside the ivory tower -- industry, government,
media, public
48Queen's University 31 March 2017
![Page 49: Big Data, Data Science, Statistics - University of Toronto Data, Data Science, Statistics Nancy Reid 31 March 2017 Queen's University 31 March 2017 Big Data = Big Machines = Lots of](https://reader034.fdocuments.in/reader034/viewer/2022042708/5ad2f6307f8b9a482c8d075a/html5/thumbnails/49.jpg)
Tripods (Transdisc Research in Princ…)
Fundamental research areas that may be a part of the focus of a transdisciplinary collaboration under this solicitation include, but are not limited to:
• Combinatorial inference on complex structures;
• Tradeoffs between computational costs and statistical efficiency;
• Randomized numerical linear algebra;
• Representation theory and non-commutative harmonic analysis;
• Topological data analysis (TDA) and homological algebra; and
• Multiple areas in machine learning including deep learning.
49Queen's University 31 March 2017
![Page 50: Big Data, Data Science, Statistics - University of Toronto Data, Data Science, Statistics Nancy Reid 31 March 2017 Queen's University 31 March 2017 Big Data = Big Machines = Lots of](https://reader034.fdocuments.in/reader034/viewer/2022042708/5ad2f6307f8b9a482c8d075a/html5/thumbnails/50.jpg)
Published Feb 2016
I. General Perspectives
I. Data-Centric, Exploratory Methods
I. Efficient Algorithms
II. Graph Approaches
III. Model Fitting and Regularization
IV. Ensemble Methods
V. Causal Inference
VI. Targeted Learning
Queen's University 31 March 2017 50
![Page 51: Big Data, Data Science, Statistics - University of Toronto Data, Data Science, Statistics Nancy Reid 31 March 2017 Queen's University 31 March 2017 Big Data = Big Machines = Lots of](https://reader034.fdocuments.in/reader034/viewer/2022042708/5ad2f6307f8b9a482c8d075a/html5/thumbnails/51.jpg)
The push back
51
How big data threatens democracy and increases inequality
“if the assessment never asks about race,
how could the algorithm throw up racially
biased results?”
“Credit scores are used by nearly
half of American employers to
screen potential employees”
Queen's University 31 March 2017
![Page 52: Big Data, Data Science, Statistics - University of Toronto Data, Data Science, Statistics Nancy Reid 31 March 2017 Queen's University 31 March 2017 Big Data = Big Machines = Lots of](https://reader034.fdocuments.in/reader034/viewer/2022042708/5ad2f6307f8b9a482c8d075a/html5/thumbnails/52.jpg)
The push back
52Queen's University 31 March 2017
“Big data is neither easier nor faster nor cheaper”
“Building a database doesn’t create its own uses”
My impression was that there is a sense in which ML is to statisticswhat robotization is to society: a job threat demanding a compelling reexamination of what is left for human statisticians to do,
![Page 53: Big Data, Data Science, Statistics - University of Toronto Data, Data Science, Statistics Nancy Reid 31 March 2017 Queen's University 31 March 2017 Big Data = Big Machines = Lots of](https://reader034.fdocuments.in/reader034/viewer/2022042708/5ad2f6307f8b9a482c8d075a/html5/thumbnails/53.jpg)
Privacy
53Queen's University 31 March 2017
![Page 54: Big Data, Data Science, Statistics - University of Toronto Data, Data Science, Statistics Nancy Reid 31 March 2017 Queen's University 31 March 2017 Big Data = Big Machines = Lots of](https://reader034.fdocuments.in/reader034/viewer/2022042708/5ad2f6307f8b9a482c8d075a/html5/thumbnails/54.jpg)
Privacy
Queen's University 31 March 2017 54
March 27
March 29
![Page 55: Big Data, Data Science, Statistics - University of Toronto Data, Data Science, Statistics Nancy Reid 31 March 2017 Queen's University 31 March 2017 Big Data = Big Machines = Lots of](https://reader034.fdocuments.in/reader034/viewer/2022042708/5ad2f6307f8b9a482c8d075a/html5/thumbnails/55.jpg)
The push back
55Queen's University 31 March 2017
![Page 56: Big Data, Data Science, Statistics - University of Toronto Data, Data Science, Statistics Nancy Reid 31 March 2017 Queen's University 31 March 2017 Big Data = Big Machines = Lots of](https://reader034.fdocuments.in/reader034/viewer/2022042708/5ad2f6307f8b9a482c8d075a/html5/thumbnails/56.jpg)
The push back
56
“Big data” has arrived, but big insights have not
Queen's University 31 March 2017
![Page 57: Big Data, Data Science, Statistics - University of Toronto Data, Data Science, Statistics Nancy Reid 31 March 2017 Queen's University 31 March 2017 Big Data = Big Machines = Lots of](https://reader034.fdocuments.in/reader034/viewer/2022042708/5ad2f6307f8b9a482c8d075a/html5/thumbnails/57.jpg)
“A range of other problems”
57
Michael Jordan, UC Berkeley
“while I do think of neural networks as one important tool in the toolbox, I find myself surprisingly rarely going to that tool when I’m consulting out in industry.
I find that industry people are often looking to solve a range of other problems, often not involving “pattern recognition” problems”
accurate answers quickly; meaningful error bars; merge various
data sources; visualize and present conclusions; diagnostics; non-
stationarity; targetted experiments within databases
Queen's University 31 March 2017
![Page 58: Big Data, Data Science, Statistics - University of Toronto Data, Data Science, Statistics Nancy Reid 31 March 2017 Queen's University 31 March 2017 Big Data = Big Machines = Lots of](https://reader034.fdocuments.in/reader034/viewer/2022042708/5ad2f6307f8b9a482c8d075a/html5/thumbnails/58.jpg)
Caution can be a good thing
58
“Digital Hippocratic Oath”
Queen's University 31 March 2017
![Page 59: Big Data, Data Science, Statistics - University of Toronto Data, Data Science, Statistics Nancy Reid 31 March 2017 Queen's University 31 March 2017 Big Data = Big Machines = Lots of](https://reader034.fdocuments.in/reader034/viewer/2022042708/5ad2f6307f8b9a482c8d075a/html5/thumbnails/59.jpg)
Caution can be a good thing
59
“…from data we will get the cure for cancer as well as
better hospitals;
schools that adapt to children’s needs making them
happier and smarter;
better policing and safer homes;
and of course jobs.”
Guardian 2 July 2016
Queen's University 31 March 2017
![Page 60: Big Data, Data Science, Statistics - University of Toronto Data, Data Science, Statistics Nancy Reid 31 March 2017 Queen's University 31 March 2017 Big Data = Big Machines = Lots of](https://reader034.fdocuments.in/reader034/viewer/2022042708/5ad2f6307f8b9a482c8d075a/html5/thumbnails/60.jpg)
Big Data 2013
Queen's University 31 March 2017 60
Gartner Hype Cycle
![Page 61: Big Data, Data Science, Statistics - University of Toronto Data, Data Science, Statistics Nancy Reid 31 March 2017 Queen's University 31 March 2017 Big Data = Big Machines = Lots of](https://reader034.fdocuments.in/reader034/viewer/2022042708/5ad2f6307f8b9a482c8d075a/html5/thumbnails/61.jpg)
Queen's University 31 March 2017 61
Big Data 2014
![Page 62: Big Data, Data Science, Statistics - University of Toronto Data, Data Science, Statistics Nancy Reid 31 March 2017 Queen's University 31 March 2017 Big Data = Big Machines = Lots of](https://reader034.fdocuments.in/reader034/viewer/2022042708/5ad2f6307f8b9a482c8d075a/html5/thumbnails/62.jpg)
Queen's University 31 March 2017 62
2015
Machine Learning
![Page 63: Big Data, Data Science, Statistics - University of Toronto Data, Data Science, Statistics Nancy Reid 31 March 2017 Queen's University 31 March 2017 Big Data = Big Machines = Lots of](https://reader034.fdocuments.in/reader034/viewer/2022042708/5ad2f6307f8b9a482c8d075a/html5/thumbnails/63.jpg)
63
Gartner Hype Cycle 2016
Smart Data
Discovery
Queen's University 31 March 2017
![Page 64: Big Data, Data Science, Statistics - University of Toronto Data, Data Science, Statistics Nancy Reid 31 March 2017 Queen's University 31 March 2017 Big Data = Big Machines = Lots of](https://reader034.fdocuments.in/reader034/viewer/2022042708/5ad2f6307f8b9a482c8d075a/html5/thumbnails/64.jpg)
Thank You!
Queen's University 31 March 2017