Platforms for Data Science - Computing on the Brink

Post on 25-Jan-2015

1.169 views 2 download

description

Talk at

Transcript of Platforms for Data Science - Computing on the Brink

There is no magicThere is only awesome

D e e p a k S i n g h

Platforms for data science

3

collection

curation

analysis

what’s the big deal?

Image: Yael Fitzpatrick (AAAS)

Image: Yael Fitzpatrick (AAAS)

lots of data

lots of people

lots of places

constant change

we want to make our data more effective

versioning

provenance

filter

aggregate

extend

mashup

human interfaces

hard problem

really hard problem

so how do get there?

information platforms

dataspaces

Further reading: Jeff Hammerbacher, Information Platforms and the rise of the data scientist, Beautiful Data

the unreasonable effectiveness of data

Halevy, et al. IEEE Intelligent Systems, 24, 8-12 (2009)

accept all data formats

evolve APIs

beyond databases and the data warehouse

data as a programmable

resource

data is a royal garden

compute is a fungible commodity

optimizing the most valuable resource

compute, storage, workflows, memory,

transmission, algorithms, cost, …

my bias

cloud services

distributed systems

scale

global

consumptionmodels

on-demand

what is the value of your data?

Credit: Angel Pizzaro, U. Penn

Bioproximity

http://aws.amazon.com/solutions/case-studies/bioproximity/

30,472 cores

$1279/hr

http://cloudbiolinux.org/

http://usegalaxy.org/cloud

in summary

large scale data requires a rethink

data architecture

compute architecture

distributed, programmable infrastructure

cloud services

remove constraints

can we build data science platforms?

there is no magicthere is only awesome

deesingh@amazon.com Twitter:@mndoci

http://slideshare.net/mndocihttp://mndoci.com

Inspiration and ideas from Matt Wood& Larry Lessig

Credit” Oberazzi under a CC-BY-NC-SA license