Amazon Redshift Performance Metrics vs Competitors
-
Upload
mohit-kanjwani -
Category
Documents
-
view
212 -
download
2
description
Transcript of Amazon Redshift Performance Metrics vs Competitors
Amazon Redshift Performance Metrics vs competitors
Self-HostingAccording to Amazon’s calculation “it generally costs between $19,000 and $25,000 per terabyte per year, at list prices, to build and run a good-sized data warehouse on your own. Amazon Redshift, all-in, will cost you less than $1,000 per terabyte per year."
Redshift vs other vendor offerings
Redshift Teradata HP Vertica EMC GreenPlum
Oracle Database
Columnar Data Storage
Available Available Available Available Available
Advanced Compression
Available Available Available Available Available
Supports ‘Sort key’ for batter dynamic sorts
Supported Not Supported Not Supported Not Supported Not Supported
Can run on ‘Virtualized Platforms’
Yes.Since Amazon
Redshift is built upon
PostgreSQL it has inherent capability to
run on commodity machines
running virtual platforms
Information not Available
Not SupportedVertica 6.1
does support Hardware
Virtual Machine but
nowhere close to Redshift’s offering of Data as a Service
Information not Available
Information not Available
Index Support Not Available Supported Not Supported No Information
Available
Supported
Redshift vs the Hadoop Open Source Platform
Apache Hadoop is an open-source software framework for distributed storage and distributed processing of Big Data.
Redshift Hadoop
Nodes Possible 100 Unlimited
Max Node Size 16 Tb Unlimited
Performance Performs better at Terabyte level data( which is usually sufficient for most businesses)
Performs better at Petabyte level data( only relevant for large businesses which will
anyways want to maintain their own warehouse)
Ease of Migration As it uses PostgreSQL as the underlying database and SQL queries it is already familiar to most developers
System administrators will need to learn Hadoop architecture and tools as they are quite different and developers will need to learn coding in Pig or MapReduce.
Data formats accepted Limited. Presently no support for XML, data arrays etc
All datatypes supported
Total Cost of Running Hadoop vs Redshift on a per Query basis
Thus we can conclude that Redshift is more suited to most businesses except the very large ones (like a database for entire Tata Group) where Hadoop might be a better choice albeit at a higher cost than Redshift.
Query Performance with other technologies
Some Additional Information which I thought might be useful for other parts of the project
The distinction between the previously available Amazon Relational Database Service (RDS) and Redshift is that the latter is exclusively for warehousing and analytics (as opposed to transactional database uses) and is capable of big-data scale. "RDS is based on Microsoft SQL Server, Oracle and MySQL, and those aren't systems that are designed to do petabyte-scale data warehousing,"
http://www.informationweek.com/software/information-management/amazon-debuts-low-cost-big-data-warehousing/d/d-id/1107568?
http://dwh-bi-etl-reviews.quora.com/Amazon-Redshift-%E2%80%93-Differentiators-and-Limitations
http://www.vertica.com/2010/11/23/life-beyond-indices-the-query-benefits-of-storing-sorted-data/
http://aws.amazon.com/documentation/redshift/
http://snowplowanalytics.com/blog/2013/09/27/how-much-does-snowplow-cost-to-run/