2 one spot redshift bigdatacamp 1.02

14
Copyright© 2013 OneSpot,Proprietary& Confidential 1 Amazon Redshift: How we managed 300 billion rows with no DBA Matt Cohen Founder & President [email protected] December 10 th , 2013

Transcript of 2 one spot redshift bigdatacamp 1.02

Page 1: 2 one spot redshift bigdatacamp 1.02

Copyright © 2013 OneSpot, Proprietary & Confidential 1

Amazon Redshift:How we managed 300 billion rows with no DBA

Matt Cohen

Founder & [email protected]

December 10th, 2013

Page 2: 2 one spot redshift bigdatacamp 1.02

What is OneSpot?

• OneSpot is a content advertising platform that distributes content as ads that people want to click on.– Fortune 2000 clients

– Realtimead exchange bidding

– Adaptive machine learning

– Seed funded until $5.3M Series A last month

• Big data, big analysis

Copyright © 2013 OneSpot, Proprietary & Confidential 2

Page 3: 2 one spot redshift bigdatacamp 1.02

What is Redshift?

1. When light from a receding object appears

shifted to the red end of the spectrum

– A consequence of the expanding universe.

2. A cheap, fast, Petabyte-scale, managed

SQL data warehouse service from Amazon

Web Services

– A consequence of the expanding cloud ecosystem

Copyright © 2013 OneSpot, Proprietary & Confidential 3

Page 4: 2 one spot redshift bigdatacamp 1.02

Why Redshift?

• Cheap

• Fast

• Petabyte-scale

• Managed Service

• SQL

• Data Warehouse

• From AWS

Copyright © 2013 OneSpot, Proprietary & Confidential 4

Page 5: 2 one spot redshift bigdatacamp 1.02

SQL Data Warehouse

• Based on the commercial ParAccel database– Which is based on Postgres

• Standards-based tools and knowledge

• Built for data warehousing– Column-oriented

– Cluster architecture

– Read optimized

– No relational integrity

– Almost no SQL extensions

Copyright © 2013 OneSpot, Proprietary & Confidential 5

Page 6: 2 one spot redshift bigdatacamp 1.02

SQL Data Warehouse

• Column-oriented

Copyright © 2013 OneSpot, Proprietary & Confidential 6

Page 7: 2 one spot redshift bigdatacamp 1.02

SQL Data Warehouse

• Column-oriented

• 11 different compression techniques

Copyright © 2013 OneSpot, Proprietary & Confidential 7

Page 8: 2 one spot redshift bigdatacamp 1.02

SQL Data Warehouse

• Cluster architecture

Copyright © 2013 OneSpot, Proprietary & Confidential 8

Page 9: 2 one spot redshift bigdatacamp 1.02

SQL Data Warehouse

• Read optimized

– Large block size (1MB)

– Data replication

• 2x live, 1x S3

• No relational integrity

– No indexes:

sort and distribution keys

• Almost no SQL

extensions

Copyright © 2013 OneSpot, Proprietary & Confidential 9

Page 10: 2 one spot redshift bigdatacamp 1.02

Fast = Cheap

• Starts with 1 XL node

– 85¢ an hour ($620/month) on demand

– 50¢ an hour ($365) 1 year reserved

• Benchmarks say:

– Scales linearly

– 5-10x faster than Hadoop/Hive

Copyright © 2013 OneSpot, Proprietary & Confidential 10

Page 11: 2 one spot redshift bigdatacamp 1.02

Petabyte scale

• Up to

– 32 XL nodes (64 Terabytes)

– 100 8XL nodes (1.6 Petabytes)

Copyright © 2013 OneSpot, Proprietary & Confidential 11

Page 12: 2 one spot redshift bigdatacamp 1.02

Managed Service from AWS

• Managed Service

– Incredibly easy

– Nice UI

– Most SQL tools

• From AWS

– Free data transfer

– Easy load from S3

– Use AWS Data Pipeline

Copyright © 2013 OneSpot, Proprietary & Confidential 12

Page 13: 2 one spot redshift bigdatacamp 1.02

The TL;DR

• Pros

– Standard SQL

– Super easy

– Very fast

– Affordable

– Integrates with AWS

– No DBA

– No Sysadmin

• Cons

– Standard SQL

– Almost no SQL

extensions

– Best with Star Schema

• Big joins can be slow

– No MapReduce

– Fixed columns

– Consistency

– 1.6 Pbyte limit

Copyright © 2013 OneSpot, Proprietary & Confidential 13

Page 14: 2 one spot redshift bigdatacamp 1.02

Copyright © 2013 OneSpot, Proprietary & Confidential 14

Amazon Redshift:How we managed 300 billion rows with no DBA

Matt Cohen

Founder & [email protected]

December 10th, 2013