AWS Paris Summit 2014 - Closing Keynote Werner Vogels - Beyond the fridge
-
Upload
amazon-web-services -
Category
Technology
-
view
857 -
download
3
description
Transcript of AWS Paris Summit 2014 - Closing Keynote Werner Vogels - Beyond the fridge
Beyond the Fridge The world of Connected Data !
Dr. Werner Vogels!CTO, Amazon.com!
The amount of information generated during the first day of a baby’s life today is equivalent to 70 times the information
contained in the Library of Congress"
I. Science"
Observations – Theory – Models – Facts"
Human Genome Project"
Collaborative project to sequence every single letter!of the human genetic code.!
13 years and $billions to complete.!Gigabyte scale datasets (transferred between sites on!iPods!)!
Beyond the Human Genome"45+ species sequenced: mouse, rat, gorilla, rabbit, !platypus, nematode, zebra fish...!Compare genomes between species to identify!biologically interesting areas of the genome.!100Gb scale datasets. Increased computational requirements.!
The Next Generation"New sequencing instruments lead to a dramatic!drop in cost and time required to sequence a genome.!Sequence and compare genetic code of individuals to!find areas of variation. Much more interesting.!Terabyte scale datasets. Significant computational requirements.!
The 1000 Genomes Projects"Public/private consortium to build world’s largest!collection of human genetic variation.!Hugely important dataset to drive new insight into!known genetic traits, and the identification of new ones.!Vast, complex data and computational resources required, beyond reach of most research groups and hospitals.!
1000 Genomes in the Cloud"
The 1000 Genomes data made available to all on AWS.!
Stored for free as part of the Public Datasets program.!Updated regularly.!200Tb. 1700 individual genomes. As much compute and storage as required available to all.!
II. Consumer"
Dropcam is the biggest inbound video service on the Web
• More data uploaded per minute than YouTube
• Petabytes of data processed every month
• Billions of mo=on events detected
III. Retail"
UNCERTAINTY"
UNDERSTAND"YOUR CUSTOMER"
Who is my customer really?
What do people really like?
What is happening socially with my products?
Where do people consume my product? How do people really use your product?
PERSONALIZE"
75% of users select"movies based on"recommendations"
More than 27 million users!~ 30 million plays per day!More than 40 billion events per day !~ 4 million ratings per day!~ 3 million searches per day!Geo-location data!Device information!Time of day and week (it now can verify that users watch more TV shows during the week and more movies during the weekend)!Metadata from third parties such as Nielsen!Social media data from Facebook and Twitter!
BIGGER IS BETTER"
IV. Industrial"
V. Sports"
VI. Location"
VII. The Pipeline"
COLLECT | STORE | ORGANIZE | ANALYZE | SHARE
COLLECT | STORE | ORGANIZE | ANALYZE | SHARE
COLLECT | STORE | ORGANIZE | ANALYZE | SHARE
COLLECT | STORE | ORGANIZE | ANALYZE | SHARE
COLLECT | STORE | ORGANIZE | ANALYZE | SHARE
COLLECT | STORE | ORGANIZE | ANALYZE | SHARE
VIII. Real-time"
What was happening yesterday?!
What ! right now?!
trades are executing!is the exception rate!is the ad click-through!topics are trending"inventory remains!queries are slow!are the high scores!!!!!!!
Kinesis!
Kinesis architecture
Amazon Web Services
AZ AZ AZ
Durable, highly consistent storage replicates dataacross three data centers (availability zones)
Aggregate andarchive to S3
Millions ofsources producing100s of terabytes
per hour
FrontEnd
AuthenticationAuthorization
Ordered streamof events supportsmultiple readers
Real-timedashboardsand alarms
Machine learningalgorithms or
sliding windowanalytics
Aggregate analysisin Hadoop or a
data warehouse
Inexpensive: $0.028 per million puts
AWS Internal Metering Service
CaptureSubmissions
Process in Realtime
Store inRedshift
ClientsSubmitting
Data
Workload • Tens of millions records/sec • Multiple TB per hour • 100,000s of sources
New features • Scale with the business • Provide real-time alerting • Inexpensive • Improved auditing
Workload • Daily load of billions records from millions of files from
hundreds of sources • 3 hour SLA to load and audit data • Hundreds of customers • Hundreds of queries per hour New features • Our data is fresh, we ingest every 6 hours • Now processing triple the volume in less than 25% of
the =me • “Hammerstone” ETL solu=on
– Built on AWS Data Pipeline – Build business specific marts – Build workload specific clusters
• Supports a variety of analy=cs tools: Tableau, R, Toad, SQL Developer, etc.
Internal AWS Data Warehouse
Over 200 internal data sources
Data staged inAmazon S3
"Hammerstone:" Custom ETLusing AWS
Data Pipeline
Data processingRedshift cluster
Batch reportingRedshift cluster
Ad hoc queryRedshift cluster
IX. Beyond the Display"
MERCURY ENERGY
CONNECTED DATA REQUIRES
NO LIMITS"
Cloud enables connected data
collection!
Cloud enables connected data
processing!
Cloud enables connected data
collaboration!