Amazon Redshift - Meetupfiles.meetup.com/4035202/AmazonRedshiftMeetup.pdf · Amazon Redshift runs...

21
Vidhya Srinivasan| [email protected] Neil Thombre | [email protected] Amazon Redshift

Transcript of Amazon Redshift - Meetupfiles.meetup.com/4035202/AmazonRedshiftMeetup.pdf · Amazon Redshift runs...

Page 1: Amazon Redshift - Meetupfiles.meetup.com/4035202/AmazonRedshiftMeetup.pdf · Amazon Redshift runs on optimized hardware HS1.8XL: 128 GB RAM, 16 Cores, 24 Spindles, 16 TB compressed

Vidhya Srinivasan| [email protected] Neil Thombre | [email protected]

Amazon Redshift

Page 2: Amazon Redshift - Meetupfiles.meetup.com/4035202/AmazonRedshiftMeetup.pdf · Amazon Redshift runs on optimized hardware HS1.8XL: 128 GB RAM, 16 Cores, 24 Spindles, 16 TB compressed

What is a Data Warehouse ? •  Large data volumes (TB to PB) •  Queries are complex and IO intensive •  Data typically loaded in batches

•  Integrates with Business Intelligence tools for reporting and analysis

Page 3: Amazon Redshift - Meetupfiles.meetup.com/4035202/AmazonRedshiftMeetup.pdf · Amazon Redshift runs on optimized hardware HS1.8XL: 128 GB RAM, 16 Cores, 24 Spindles, 16 TB compressed

DW - Existing AWS landscape

Scale  Out  

Fully  SQL  Compa2ble  

Op2mized  data  import  &  export  

Efficient  Aggregates  &  Joins  

Local  storage  

No  single  point  of  failure  

RDS   X   X  DynamoDB   X   X   X  EMR/Hadoop   X   X   ½     X  

Page 4: Amazon Redshift - Meetupfiles.meetup.com/4035202/AmazonRedshiftMeetup.pdf · Amazon Redshift runs on optimized hardware HS1.8XL: 128 GB RAM, 16 Cores, 24 Spindles, 16 TB compressed

DW - Existing AWS landscape

Scale  Out  

Fully  SQL  Compa2ble  

Op2mized  data  import  &  export  

Efficient  Aggregates  &  Joins  

Local  storage  

No  single  point  of  failure  

RDS   X   X  DynamoDB   X   X   X  EMR/Hadoop   X   X   ½   X  RedshiJ   X   X   X   X   X   X  

Page 5: Amazon Redshift - Meetupfiles.meetup.com/4035202/AmazonRedshiftMeetup.pdf · Amazon Redshift runs on optimized hardware HS1.8XL: 128 GB RAM, 16 Cores, 24 Spindles, 16 TB compressed

Introducing Amazon Redshift

•  Fully managed database service

•  Built from the ground up for DW •  Secure & Reliable – Fault tolerant, automatic backup, encryption

•  Fast – Scale out, specialized hardware, columnar storage

•  Inexpensive – 1/10th the cost of alternatives, pay as you go

•  Easy to Use – Provision & resize with a few clicks •  Compatible – JDBC/ODBC, mostly PostgreSQL compatible

Page 6: Amazon Redshift - Meetupfiles.meetup.com/4035202/AmazonRedshiftMeetup.pdf · Amazon Redshift runs on optimized hardware HS1.8XL: 128 GB RAM, 16 Cores, 24 Spindles, 16 TB compressed

Why did we call it Amazon Redshift?

Edwin  Hubble  1889  –  1953  

Page 7: Amazon Redshift - Meetupfiles.meetup.com/4035202/AmazonRedshiftMeetup.pdf · Amazon Redshift runs on optimized hardware HS1.8XL: 128 GB RAM, 16 Cores, 24 Spindles, 16 TB compressed

>> How much storage is provisioned by Redshift customers ?

>>  How  many  Redshi<  clusters  were  created  in  first  10  weeks?  

 

 

Page 8: Amazon Redshift - Meetupfiles.meetup.com/4035202/AmazonRedshiftMeetup.pdf · Amazon Redshift runs on optimized hardware HS1.8XL: 128 GB RAM, 16 Cores, 24 Spindles, 16 TB compressed

Amazon Redshift architecture •  Leader Node

–  SQL endpoint –  Stores metadata –  Coordinates query execution

•  Compute Nodes –  Local, columnar storage –  Execute queries in parallel –  Load, backup, restore via Amazon S3 –  Parallel load from Amazon DynamoDB

•  Single node version available

10  GigE  (HPC)  

IngesKon  Backup  Restore  

SQL Clients/BI Tools

128GB RAM

16TB disk

16 cores

Amazon S3

JDBC/ODBC  

128GB RAM

16TB disk

16 cores Compute Node

128GB RAM

16TB disk

16 cores Compute Node

128GB RAM

16TB disk

16 cores Compute Node

Leader Node

Page 9: Amazon Redshift - Meetupfiles.meetup.com/4035202/AmazonRedshiftMeetup.pdf · Amazon Redshift runs on optimized hardware HS1.8XL: 128 GB RAM, 16 Cores, 24 Spindles, 16 TB compressed

Amazon Redshift dramatically reduces I/O

•  Data compression

•  Zone maps

•  Direct-attached storage

•  Large data block sizes

ID   Age   State   Amount  

123   20   CA   500  

345   25   WA   250  

678   40   FL   125  

957   37   WA   375  

•  With row storage you do unnecessary I/O

•  To get total amount, you have to read everything

Page 10: Amazon Redshift - Meetupfiles.meetup.com/4035202/AmazonRedshiftMeetup.pdf · Amazon Redshift runs on optimized hardware HS1.8XL: 128 GB RAM, 16 Cores, 24 Spindles, 16 TB compressed

Amazon Redshift dramatically reduces I/O

•  Data compression

•  Zone maps

•  Direct-attached storage

•  Large data block sizes

ID   Age   State   Amount  

123   20   CA   500  

345   25   WA   250  

678   40   FL   125  

957   37   WA   375  

•  With column storage, you only read the data you need

Page 11: Amazon Redshift - Meetupfiles.meetup.com/4035202/AmazonRedshiftMeetup.pdf · Amazon Redshift runs on optimized hardware HS1.8XL: 128 GB RAM, 16 Cores, 24 Spindles, 16 TB compressed

Amazon Redshift dramatically reduces I/O

•  Column storage

•  Data compression

•  Zone maps

•  Direct-attached storage

•  Large data block sizes •  Columnar compression saves

space & reduces I/O

•  Amazon Redshift analyzes and compresses your data

analyze compression listing; Table | Column | Encoding ---------+----------------+---------- listing | listid | delta listing | sellerid | delta32k listing | eventid | delta32k listing | dateid | bytedict listing | numtickets | bytedict listing | priceperticket | delta32k listing | totalprice | mostly32 listing | listtime | raw

Page 12: Amazon Redshift - Meetupfiles.meetup.com/4035202/AmazonRedshiftMeetup.pdf · Amazon Redshift runs on optimized hardware HS1.8XL: 128 GB RAM, 16 Cores, 24 Spindles, 16 TB compressed

Amazon Redshift dramatically reduces I/O

•  Column storage

•  Data compression

•  Direct-attached storage

•  Large data block sizes

•  Track of the minimum and maximum value for each block

•  Skip over blocks that don’t contain the data needed for a given query

•  Minimize unnecessary I/O

Page 13: Amazon Redshift - Meetupfiles.meetup.com/4035202/AmazonRedshiftMeetup.pdf · Amazon Redshift runs on optimized hardware HS1.8XL: 128 GB RAM, 16 Cores, 24 Spindles, 16 TB compressed

Amazon Redshift dramatically reduces I/O

•  Column storage

•  Data compression

•  Zone maps

•  Direct-attached storage

•  Large data block sizes

•  Use direct-attached storage to maximize throughput

•  Hardware optimized for high performance data processing

•  Large block sizes to make the most of each read

•  Amazon Redshift manages durability for you

Page 14: Amazon Redshift - Meetupfiles.meetup.com/4035202/AmazonRedshiftMeetup.pdf · Amazon Redshift runs on optimized hardware HS1.8XL: 128 GB RAM, 16 Cores, 24 Spindles, 16 TB compressed

Amazon Redshift runs on optimized hardware HS1.8XL: 128 GB RAM, 16 Cores, 24 Spindles, 16 TB compressed user storage, 2 GB/sec scan rate

16 GB RAM

2 TB disk

2 cores

HS1.XL: 16 GB RAM, 2 Cores, 3 Spindles, 2 TB compressed customer storage

•  Optimized for I/O intensive workloads •  High disk density •  Runs in HPC - fast network •  HS1.8XL available on Amazon EC2 •  Need to leverage all the nodes

128 GB RAM

16 cores

16 TB disk

Page 15: Amazon Redshift - Meetupfiles.meetup.com/4035202/AmazonRedshiftMeetup.pdf · Amazon Redshift runs on optimized hardware HS1.8XL: 128 GB RAM, 16 Cores, 24 Spindles, 16 TB compressed

Amazon Redshift parallelizes and distributes everything •  Query

•  Load

•  Backup/Restore •  Resize

Page 16: Amazon Redshift - Meetupfiles.meetup.com/4035202/AmazonRedshiftMeetup.pdf · Amazon Redshift runs on optimized hardware HS1.8XL: 128 GB RAM, 16 Cores, 24 Spindles, 16 TB compressed

Amazon Redshift parallelizes and distributes everything

•  Load in parallel from Amazon S3 or Amazon DynamoDB

•  Data automatically distributed and sorted according to DDL

•  Scales linearly with number of nodes

Amazon S3/DynamoDB

128GB RAM

16TB disk

16 cores Compute Node

128GB RAM

16TB disk

16 cores Compute Node

128GB RAM

16TB disk

16 cores Compute Node

•  Query

•  Load

•  Backup/Restore •  Resize

Page 17: Amazon Redshift - Meetupfiles.meetup.com/4035202/AmazonRedshiftMeetup.pdf · Amazon Redshift runs on optimized hardware HS1.8XL: 128 GB RAM, 16 Cores, 24 Spindles, 16 TB compressed

Amazon Redshift parallelizes and distributes everything

•  Backups to Amazon S3 are automatic, continuous and incremental

•  Configurable system snapshot retention period

•  Take user snapshots on-demand

•  Streaming restores enable you to resume querying faster

Amazon S3

128GB RAM

16TB disk

16 cores Compute Node

128GB RAM

16TB disk

16 cores Compute Node

128GB RAM

16TB disk

16 cores Compute Node

•  Query

•  Load

•  Backup/Restore •  Resize

Page 18: Amazon Redshift - Meetupfiles.meetup.com/4035202/AmazonRedshiftMeetup.pdf · Amazon Redshift runs on optimized hardware HS1.8XL: 128 GB RAM, 16 Cores, 24 Spindles, 16 TB compressed

Amazon Redshift parallelizes and distributes everything

•  Resize while remaining online

•  Provision a new cluster in the background

•  Copy data in parallel from node to node

•  Only charged for source cluster

•  Query

•  Load

•  Backup/Restore •  Resize

SQL Clients/BI Tools

128GB RAM

48TB disk

16 cores Compute Node

128GB RAM

48TB disk

16 cores Compute Node

128GB RAM

48TB disk

16 cores Compute Node

128GB RAM

48TB disk

16 cores Leader Node

128GB RAM

48TB disk

16 cores Compute Node

128GB RAM

48TB disk

16 cores Compute Node

128GB RAM

48TB disk

16 cores Compute Node

128GB RAM

48TB disk

16 cores Compute Node

128GB RAM

48TB disk

16 cores Leader Node

Page 19: Amazon Redshift - Meetupfiles.meetup.com/4035202/AmazonRedshiftMeetup.pdf · Amazon Redshift runs on optimized hardware HS1.8XL: 128 GB RAM, 16 Cores, 24 Spindles, 16 TB compressed

Amazon Redshift parallelizes and distributes everything •  Query

•  Load

•  Backup/Restore •  Resize

SQL Clients/BI Tools

128GB RAM

48TB disk

16 cores Compute Node

128GB RAM

48TB disk

16 cores Compute Node

128GB RAM

48TB disk

16 cores Compute Node

128GB RAM

48TB disk

16 cores Compute Node

128GB RAM

48TB disk

16 cores Leader Node

•  Automatic SQL endpoint switchover via DNS

•  Decommission the source cluster

•  Simple operation via AWS Console or API

Page 20: Amazon Redshift - Meetupfiles.meetup.com/4035202/AmazonRedshiftMeetup.pdf · Amazon Redshift runs on optimized hardware HS1.8XL: 128 GB RAM, 16 Cores, 24 Spindles, 16 TB compressed

Amazon Redshift lets you start small and grow big Extra Large Node (HS1.XL) 3 spindles, 2 TB, 16 GB RAM, 2 cores Single Node (2 TB)

Cluster 2-32 Nodes (4 TB – 64 TB)

Eight Extra Large Node (HS1.8XL) 24 spindles, 16 TB, 128 GB RAM, 16 cores, 10 GigE Cluster 2-100 Nodes (32 TB – 1.6 PB) 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL

8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL

8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL

8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL

8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL

8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL

8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL

8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL

8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL

8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL

XL

XL XL XL XL XL XL XL XL

XL XL XL XL XL XL XL XL

XL XL XL XL XL XL XL XL

XL XL XL XL XL XL XL XL

Note:  Nodes  not  to  scale  

Page 21: Amazon Redshift - Meetupfiles.meetup.com/4035202/AmazonRedshiftMeetup.pdf · Amazon Redshift runs on optimized hardware HS1.8XL: 128 GB RAM, 16 Cores, 24 Spindles, 16 TB compressed