AWS Paris Summit 2014 - Closing Keynote Werner Vogels - Beyond the fridge

Post on 15-Jan-2015

857 views 3 download

Tags:

description

AWS Summit Paris closing keynote by Werner Vogels

Transcript of AWS Paris Summit 2014 - Closing Keynote Werner Vogels - Beyond the fridge

Beyond the Fridge The world of Connected Data !

Dr. Werner Vogels!CTO, Amazon.com!

The amount of information generated during the first day of a baby’s life today is equivalent to 70 times the information

contained in the Library of Congress"

I. Science"

Observations – Theory – Models – Facts"

Human Genome Project"

Collaborative project to sequence every single letter!of the human genetic code.!

13 years and $billions to complete.!Gigabyte scale datasets (transferred between sites on!iPods!)!

Beyond the Human Genome"45+ species sequenced: mouse, rat, gorilla, rabbit, !platypus, nematode, zebra fish...!Compare genomes between species to identify!biologically interesting areas of the genome.!100Gb scale datasets. Increased computational requirements.!

The Next Generation"New sequencing instruments lead to a dramatic!drop in cost and time required to sequence a genome.!Sequence and compare genetic code of individuals to!find areas of variation. Much more interesting.!Terabyte scale datasets. Significant computational requirements.!

The 1000 Genomes Projects"Public/private consortium to build world’s largest!collection of human genetic variation.!Hugely important dataset to drive new insight into!known genetic traits, and the identification of new ones.!Vast, complex data and computational resources required, beyond reach of most research groups and hospitals.!

1000 Genomes in the Cloud"

The 1000 Genomes data made available to all on AWS.!

Stored for free as part of the Public Datasets program.!Updated regularly.!200Tb. 1700 individual genomes. As much compute and storage as required available to all.!

II. Consumer"

Dropcam  is  the  biggest  inbound  video  service  on  the  Web    

•  More  data  uploaded  per  minute  than  YouTube    

•  Petabytes  of  data  processed  every  month  

•  Billions  of  mo=on  events  detected  

III. Retail"

UNCERTAINTY"

UNDERSTAND"YOUR CUSTOMER"

Who  is  my  customer  really?      

What  do  people  really  like?    

What  is  happening  socially  with  my  products?    

Where  do  people  consume  my  product?  How  do  people  really  use  your  product?    

PERSONALIZE"

75% of users select"movies based on"recommendations"

More than 27 million users!~ 30 million plays per day!More than 40 billion events per day !~ 4 million ratings per day!~ 3 million searches per day!Geo-location data!Device information!Time of day and week (it now can verify that users watch more TV shows during the week and more movies during the weekend)!Metadata from third parties such as Nielsen!Social media data from Facebook and Twitter!

BIGGER IS BETTER"

IV. Industrial"

V. Sports"

VI. Location"

VII. The Pipeline"

COLLECT  |  STORE  |  ORGANIZE  |  ANALYZE  |  SHARE  

COLLECT  |  STORE  |  ORGANIZE  |  ANALYZE  |  SHARE  

COLLECT  |  STORE  |  ORGANIZE  |  ANALYZE  |  SHARE  

COLLECT  |  STORE  |  ORGANIZE  |  ANALYZE  |  SHARE  

COLLECT  |  STORE  |  ORGANIZE  |  ANALYZE  |  SHARE  

COLLECT  |  STORE  |  ORGANIZE  |  ANALYZE  |  SHARE  

VIII. Real-time"

What was happening yesterday?!

What ! right now?!

trades are executing!is the exception rate!is the ad click-through!topics are trending"inventory remains!queries are slow!are the high scores!!!!!!!

Kinesis!

Kinesis  architecture  

Amazon Web Services

AZ AZ AZ

Durable, highly consistent storage replicates dataacross three data centers (availability zones)

Aggregate andarchive to S3

Millions ofsources producing100s of terabytes

per hour

FrontEnd

AuthenticationAuthorization

Ordered streamof events supportsmultiple readers

Real-timedashboardsand alarms

Machine learningalgorithms or

sliding windowanalytics

Aggregate analysisin Hadoop or a

data warehouse

Inexpensive: $0.028 per million puts

AWS  Internal  Metering  Service  

CaptureSubmissions

Process in Realtime

Store inRedshift

ClientsSubmitting

Data

Workload •  Tens of millions records/sec •  Multiple TB per hour •  100,000s of sources

New features •  Scale with the business •  Provide real-time alerting •  Inexpensive •  Improved auditing

Workload  •  Daily  load  of  billions  records  from  millions  of  files  from  

hundreds  of  sources  •  3  hour  SLA  to  load  and  audit  data  •  Hundreds  of  customers  •  Hundreds  of  queries  per  hour    New  features  •  Our  data  is  fresh,  we  ingest  every  6  hours  •  Now  processing  triple  the  volume  in  less  than  25%  of  

the  =me  •  “Hammerstone”  ETL  solu=on    

–  Built  on  AWS  Data  Pipeline  –  Build  business  specific  marts  –  Build  workload  specific  clusters  

•  Supports  a  variety  of  analy=cs  tools:  Tableau,  R,  Toad,  SQL  Developer,  etc.  

Internal  AWS  Data  Warehouse  

Over 200 internal data sources

Data staged inAmazon S3

"Hammerstone:" Custom ETLusing AWS

Data Pipeline

Data processingRedshift cluster

Batch reportingRedshift cluster

Ad hoc queryRedshift cluster

IX. Beyond the Display"

MERCURY ENERGY  

CONNECTED DATA REQUIRES

NO LIMITS"

Cloud enables connected data

collection!

Cloud enables connected data

processing!

Cloud enables connected data

collaboration!

werner@amazon.com