2 one spot redshift bigdatacamp 1.02
-
Upload
bigdatacamp -
Category
Technology
-
view
185 -
download
0
Transcript of 2 one spot redshift bigdatacamp 1.02
Copyright © 2013 OneSpot, Proprietary & Confidential 1
Amazon Redshift:How we managed 300 billion rows with no DBA
Matt Cohen
Founder & [email protected]
December 10th, 2013
What is OneSpot?
• OneSpot is a content advertising platform that distributes content as ads that people want to click on.– Fortune 2000 clients
– Realtimead exchange bidding
– Adaptive machine learning
– Seed funded until $5.3M Series A last month
• Big data, big analysis
Copyright © 2013 OneSpot, Proprietary & Confidential 2
What is Redshift?
1. When light from a receding object appears
shifted to the red end of the spectrum
– A consequence of the expanding universe.
2. A cheap, fast, Petabyte-scale, managed
SQL data warehouse service from Amazon
Web Services
– A consequence of the expanding cloud ecosystem
Copyright © 2013 OneSpot, Proprietary & Confidential 3
Why Redshift?
• Cheap
• Fast
• Petabyte-scale
• Managed Service
• SQL
• Data Warehouse
• From AWS
Copyright © 2013 OneSpot, Proprietary & Confidential 4
SQL Data Warehouse
• Based on the commercial ParAccel database– Which is based on Postgres
• Standards-based tools and knowledge
• Built for data warehousing– Column-oriented
– Cluster architecture
– Read optimized
– No relational integrity
– Almost no SQL extensions
Copyright © 2013 OneSpot, Proprietary & Confidential 5
SQL Data Warehouse
• Column-oriented
Copyright © 2013 OneSpot, Proprietary & Confidential 6
SQL Data Warehouse
• Column-oriented
• 11 different compression techniques
Copyright © 2013 OneSpot, Proprietary & Confidential 7
SQL Data Warehouse
• Cluster architecture
Copyright © 2013 OneSpot, Proprietary & Confidential 8
SQL Data Warehouse
• Read optimized
– Large block size (1MB)
– Data replication
• 2x live, 1x S3
• No relational integrity
– No indexes:
sort and distribution keys
• Almost no SQL
extensions
Copyright © 2013 OneSpot, Proprietary & Confidential 9
Fast = Cheap
• Starts with 1 XL node
– 85¢ an hour ($620/month) on demand
– 50¢ an hour ($365) 1 year reserved
• Benchmarks say:
– Scales linearly
– 5-10x faster than Hadoop/Hive
Copyright © 2013 OneSpot, Proprietary & Confidential 10
Petabyte scale
• Up to
– 32 XL nodes (64 Terabytes)
– 100 8XL nodes (1.6 Petabytes)
Copyright © 2013 OneSpot, Proprietary & Confidential 11
Managed Service from AWS
• Managed Service
– Incredibly easy
– Nice UI
– Most SQL tools
• From AWS
– Free data transfer
– Easy load from S3
– Use AWS Data Pipeline
Copyright © 2013 OneSpot, Proprietary & Confidential 12
The TL;DR
• Pros
– Standard SQL
– Super easy
– Very fast
– Affordable
– Integrates with AWS
– No DBA
– No Sysadmin
• Cons
– Standard SQL
– Almost no SQL
extensions
– Best with Star Schema
• Big joins can be slow
– No MapReduce
– Fixed columns
– Consistency
– 1.6 Pbyte limit
Copyright © 2013 OneSpot, Proprietary & Confidential 13
Copyright © 2013 OneSpot, Proprietary & Confidential 14
Amazon Redshift:How we managed 300 billion rows with no DBA
Matt Cohen
Founder & [email protected]
December 10th, 2013