Platforms for Data Science - Computing on the Brink

72
There is no magic There is only awesome Deepak Singh Platforms for data science

description

Talk at

Transcript of Platforms for Data Science - Computing on the Brink

Page 1: Platforms for Data Science - Computing on the Brink

There is no magicThere is only awesome

D e e p a k S i n g h

Platforms for data science

Page 3: Platforms for Data Science - Computing on the Brink

3

Page 4: Platforms for Data Science - Computing on the Brink

collection

Page 5: Platforms for Data Science - Computing on the Brink

curation

Page 6: Platforms for Data Science - Computing on the Brink

analysis

Page 7: Platforms for Data Science - Computing on the Brink

what’s the big deal?

Page 8: Platforms for Data Science - Computing on the Brink
Page 10: Platforms for Data Science - Computing on the Brink

Image: Yael Fitzpatrick (AAAS)

Page 11: Platforms for Data Science - Computing on the Brink

Image: Yael Fitzpatrick (AAAS)

Page 12: Platforms for Data Science - Computing on the Brink

lots of data

Page 13: Platforms for Data Science - Computing on the Brink

lots of people

Page 14: Platforms for Data Science - Computing on the Brink

lots of places

Page 15: Platforms for Data Science - Computing on the Brink

constant change

Page 16: Platforms for Data Science - Computing on the Brink

we want to make our data more effective

Page 17: Platforms for Data Science - Computing on the Brink

versioning

Page 18: Platforms for Data Science - Computing on the Brink

provenance

Page 19: Platforms for Data Science - Computing on the Brink

filter

Page 20: Platforms for Data Science - Computing on the Brink

aggregate

Page 21: Platforms for Data Science - Computing on the Brink

extend

Page 22: Platforms for Data Science - Computing on the Brink

mashup

Page 23: Platforms for Data Science - Computing on the Brink

human interfaces

Page 24: Platforms for Data Science - Computing on the Brink
Page 26: Platforms for Data Science - Computing on the Brink

hard problem

Page 27: Platforms for Data Science - Computing on the Brink

really hard problem

Page 28: Platforms for Data Science - Computing on the Brink

so how do get there?

Page 29: Platforms for Data Science - Computing on the Brink

information platforms

Page 31: Platforms for Data Science - Computing on the Brink

dataspaces

Further reading: Jeff Hammerbacher, Information Platforms and the rise of the data scientist, Beautiful Data

Page 32: Platforms for Data Science - Computing on the Brink

the unreasonable effectiveness of data

Halevy, et al. IEEE Intelligent Systems, 24, 8-12 (2009)

Page 33: Platforms for Data Science - Computing on the Brink

accept all data formats

Page 34: Platforms for Data Science - Computing on the Brink

evolve APIs

Page 35: Platforms for Data Science - Computing on the Brink

beyond databases and the data warehouse

Page 36: Platforms for Data Science - Computing on the Brink

data as a programmable

resource

Page 37: Platforms for Data Science - Computing on the Brink

data is a royal garden

Page 38: Platforms for Data Science - Computing on the Brink

compute is a fungible commodity

Page 39: Platforms for Data Science - Computing on the Brink

optimizing the most valuable resource

Page 40: Platforms for Data Science - Computing on the Brink

compute, storage, workflows, memory,

transmission, algorithms, cost, …

Page 43: Platforms for Data Science - Computing on the Brink

my bias

Page 44: Platforms for Data Science - Computing on the Brink

cloud services

Page 45: Platforms for Data Science - Computing on the Brink

distributed systems

Page 46: Platforms for Data Science - Computing on the Brink

scale

Page 47: Platforms for Data Science - Computing on the Brink

global

Page 48: Platforms for Data Science - Computing on the Brink

consumptionmodels

Page 49: Platforms for Data Science - Computing on the Brink

on-demand

Page 50: Platforms for Data Science - Computing on the Brink

what is the value of your data?

Page 51: Platforms for Data Science - Computing on the Brink
Page 52: Platforms for Data Science - Computing on the Brink
Page 53: Platforms for Data Science - Computing on the Brink

Credit: Angel Pizzaro, U. Penn

Page 55: Platforms for Data Science - Computing on the Brink
Page 56: Platforms for Data Science - Computing on the Brink

Bioproximity

http://aws.amazon.com/solutions/case-studies/bioproximity/

Page 57: Platforms for Data Science - Computing on the Brink
Page 58: Platforms for Data Science - Computing on the Brink
Page 59: Platforms for Data Science - Computing on the Brink

30,472 cores

Page 60: Platforms for Data Science - Computing on the Brink

$1279/hr

Page 61: Platforms for Data Science - Computing on the Brink

http://cloudbiolinux.org/

Page 62: Platforms for Data Science - Computing on the Brink

http://usegalaxy.org/cloud

Page 63: Platforms for Data Science - Computing on the Brink

in summary

Page 64: Platforms for Data Science - Computing on the Brink

large scale data requires a rethink

Page 65: Platforms for Data Science - Computing on the Brink

data architecture

Page 66: Platforms for Data Science - Computing on the Brink

compute architecture

Page 67: Platforms for Data Science - Computing on the Brink

distributed, programmable infrastructure

Page 68: Platforms for Data Science - Computing on the Brink

cloud services

Page 69: Platforms for Data Science - Computing on the Brink

remove constraints

Page 70: Platforms for Data Science - Computing on the Brink

can we build data science platforms?

Page 71: Platforms for Data Science - Computing on the Brink

there is no magicthere is only awesome

Page 72: Platforms for Data Science - Computing on the Brink

[email protected] Twitter:@mndoci

http://slideshare.net/mndocihttp://mndoci.com

Inspiration and ideas from Matt Wood& Larry Lessig

Credit” Oberazzi under a CC-BY-NC-SA license