AWS Webcast - Tableau Big Data Solution Showcase
-
Upload
amazon-web-services -
Category
Technology
-
view
557 -
download
3
description
Transcript of AWS Webcast - Tableau Big Data Solution Showcase
From weeks to hours: how Tableau and AWS changed big data analytics
AWS Big Data Solution Showcase
The recording of this webinar is available here:
https://connect.awswebcasts.com/p8hwp1gyvtd
Introductions
• Paul Lilford
– Channel Director, Technology Partners, Tableau
• Dustin Smith
– Product Marketing Manager, Tableau
• Rahul Bhartia
– Ecosystem Solution Architect, AWS
Agenda
Everything you need to be up and running AWS + Tableau
– AWS big data related services
– Tableau analytics on AWS
– Live demo
– Customer success story: Mixpo
– Q & A
Big data & AWS
Technologies and techniques for working productively
with data, at any scale.
Big data Cloud computing
Big data and AWS Cloud computing
• Potentially massive datasets
• Iterative, experimental style of
data manipulation and analysis
• Frequently not steady-state
workload; peaks and valleys
• Hard to configure/manage the
Infrastructure
• Massive, virtually unlimited capacity
• Iterative, experimental style of infrastructure deployment/usage
• Elasticity for highly variable workloads
• Managed services for data storage and analysis
AWS Data Services
Data
Velocity
Variety
Volume
Structured, Unstructured, Text, Binary
Gigabytes, Terabytes, Petabytes
Millisecond, Second, Minute, Hour, Day
EC2EBS
Instance
RedshiftRDS
Relational
EMR
Hadoop
DynamoDB
NoSQL
Kinesis
Stream
Storage
S3 Glacier
Elasticache
Caching
Data
Pipeline
Orchestrate
Store anything
Object storage
Scalable
Designed for 99.999999999% durability
Amazon
S3
Real-time processing
High throughput; elastic
Easy to use
EMR, S3, Redshift, DynamoDB Integration
Amazon
Kinesis
NoSQL Database
Seamless scalability
Zero admin
Single digit millisecond latency
Amazon
DynamoDB
Relational data warehouse
Massively parallel
Petabyte scale
Fully managed
$1,000/TB/Year
Amazon
Redshift
Hadoop/HDFS clusters
Hive, Pig, Impala, HBase
Easy to use; fully managed
On-demand and spot pricing
S3, DynamoDB, Redshift and Kinesis
Amazon
Elastic
MapReduce
http://aws.amazon.com/marketplace
Big Data Case Studies
Learn from other AWS customers
aws.amazon.com/solutions/case-studies/big-data
Tableau & AWS
• The Opportunity of the Cloud
Time to Implement
Total Cost of Ownership
Access. Anywhere. Anytime. Any
Device.
Amazon Web Services and
Tableau together make seeing,
exploring, analyzing, and reporting
off of Big Data an achievable
everyday task for the everyday
person.
FlexibleTransform all types of data into self-service analytics
FlexibleTransform all types of data into self-service analytics
FlexibleTransform all types of data into self-service analytics
Amazon EMR Amazon RDS Amazon Redshift
Tableau
Desktop
ODBC
Amazon EMR Amazon RDS Amazon Redshift
Tableau
Desktop
ODBC
Amazon EMR Amazon RDS Amazon Redshift
Tableau
Desktop
ODBC
Amazon EMR Amazon RDS Amazon Redshift
Tableau
Desktop
ODBC
Amazon EMR Amazon RDS Amazon Redshift
Tableau
Desktop
ODBC
Amazon EMR Amazon RDS Amazon Redshift
Tableau
Desktop
ODBC
Amazon EMR Amazon RDS Amazon Redshift
Tableau
Desktop
ODBC
Amazon EMR Amazon RDS Amazon Redshift
Tableau
Desktop
ODBC
Amazon EMR Amazon RDS Amazon Redshift
Tableau
Desktop
ODBC
Amazon EMR Amazon RDS Amazon Redshift
Tableau
Desktop
ODBC
Tableau Server
Amazon EMR Amazon RDS Amazon Redshift
Tableau
Desktop
ODBC
Tableau Server
Amazon EMR Amazon RDS Amazon Redshift
Tableau
Desktop
ODBC
Tableau Server
Amazon EMR Amazon RDS Amazon Redshift
Tableau
Desktop
ODBC
Tableau Server
Amazon EMR Amazon RDS Amazon Redshift
Tableau
Desktop
ODBC
Tableau Server
Amazon EMR Amazon RDS Amazon Redshift
Tableau
Desktop
ODBC
Tableau Server
virtual private cloudvirtual private cloud
Tableau
Server
Amazon EMR Amazon RDS Amazon Redshift
Tableau
Desktop
ODBC
Tableau Server
virtual private cloudvirtual private cloud
Tableau
Server
Tableau Online
DemoTableau Desktop connected live to Amazon Redshift(Customer Behavior Metrics – credit application web session tracking)
Customer Success Story
Angie MoeDirector of Analytics, Mixpo
Replays
# of Clicks
Page Location
Volume
Replays
# of Clicks
Page Location
Volume
Replays
# of Clicks
Page Location
Volume
Mixpo Web Servers
PostgreSQL Database Environment
1 Billion Views Monthly
Mixpo Web Servers
PostgreSQL Database Environment
1 Billion Views Monthly
Query Times
Hours/Day
s
Mixpo Web Servers
PostgreSQL Database Environment
Amazon Redshift
Query Times
Minutes
after
Results of Mixpo’s Redshift + Tableau Implementation
Existing Analytics Faster
Innovation: Fraud Detection
New Methodology
Curious?
Tableau:
www.tableausoftware.com/products/trial
Redshift:
http://aws.amazon.com/redshift/getting-started/
How to Get Started
Tableau + Amazon Redshift Solution Page: http://tabsoft.co/AWSRedshift
Tableau + Amazon Redshift Mixpo Case Study: http://tabsoft.co/AWSRedshiftMixpo
Tableau Getting Started Kit: http://tabsoft.co/20dayquickstart
Tableau + Redshift Test Drive: https://www.slalom.com/aws
Redshift FAQ Document: http://aws.amazon.com/redshift/faqs/
.
Additional Resources
Q&A Session TranscriptionQuestion Answer(s) Resource(s)
I've noticed that Tableau Extracts for detailed data takes a longtime to create (even using RDS). Any recommendations on how to reduce how long it takes to create the initial extract
Tableau Data Extracts that take a long time to create can usually be traced back to one of two things: 1. A slow data environment, or 2. "Long Data" - a table that has quite a few columns (100+). If RDS is the data source, then it might be a number of columns issue. You might try excluding any data columns you're not using in your analysis when you take a Tableau Data Extract. While setting up the extract, there is an option to hide unused columns. This effectively doesn't bring them into the Tableau Data Extract.
* http://bensullins.com/leveraging-your-tableau-server-to-create-large-data-extracts/
Can we host tableau server locally within our internal network?
Tableau Server can absolutely be hosted internally within your organization's network and still take full advantage of hosted Amazon Web Service data environments like Redshift, EMR, and RDS. Depening on your organization's use case for sharing interactive analytics, some Tableau customers will deploy one instance of Tableau Server to an internal network for internal reporting and/or staging. They will also choose to host a second instance of Tableau Server in an EC2 instance in order to serve customers or partners with analytic reports and applications without having to open ports in their fire wall.
*http://www.tableausoftware.com/learn/whitepapers/ensuring-high-availability*http://downloads.tableausoftware.com/quickstart/feature-guides/aws.pdf
What challenges do you find for organizations to adopt Tableau, do you run into embedded structures that might be threatened by how it empowers non-technical end users?
Many organizations will begin adopting both Tableau Desktop and Tableau Server from the business side and then after some time IT will become involved to help manage and further support Tableau deployments. Often times the IT group is very excited to help support Tableau adoption once they realize that it has the power to let them focus strategic projects as opposed to needing to support analytic efforts (refreshing locla data sources, reporting queue, etc.). Since Tableau supports a true self-service Business Inteligence model where business users can engage with data directly, this results in IT being able to stay focused on platform health. When Tableau is combined with AWS solutions like Redshift, EMR, and RDS the overhead for IT to manage data environments becomes even less. Hosting Tableau Server in AWS EC2 goes even further to help IT organizations manage the capabilities and costs of their overall platform.
* http://www.tableausoftware.com/drive
Where can I get more information about Tableau Server on VPC
Tableau has a published quickstart guide on hosting Tableau Server in the AWS cloud leveraging a VPC. You can also refer to our walk through guide on our community forum page.
*http://downloads.tableausoftware.com/quickstart/feature-guides/aws.pdf*http://community.tableausoftware.com/thread/135464
Is this HIPAA secure?
Tableau Answer: Tableau is used by many Healthcare organizations in the United States who must meet HIPAA compliance. This is accomplished in several ways - all depending on the unique data environments and requirements of each institution. Please the Tableau Forum thread where this is discussed by several of those healthcare institutions.
AWS Answer: Yes, Redshift is HIPAA complaint...and you can take advantage of feautures like built in encryption to run HIPAA compliant workloads on AWS
*http://community.tableausoftware.com/message/194129
Q&A Session Transcription (cont.)Question Answer(s) Resource(s)
Using Tableau 8, it was not possible to mix data from Orace, SQL Server in one analysis. Is this still true?.
Tableau has the ability to take query results from multiple data sources such as Redshift, SQL Server, Oracle, Salesforce, Splunk, Hadoop (to name a few) and actually aggregate them on the fly. We call this process Data Blending and it requres no SQL query writing to accomplish since Tableau can dynamically detect like fields and use those as blending keys. This capability is incredibly powerful especially quickly needing to evaluate the value/veracity of data sources that may want to be added to an Amazon Redshift environment.
* http://www.tableausoftware.com/videos/data-integration
I'm using Tableau with Redshift with some billions of rows of aggregated data. The queries, especially when using joins, are tens of seconds or minutes -- which is just too much for explorative analysis (I'd want max 10 seconds per query). Are there easy ways to sample the data in Tableau?
Tableau doesn't have an automatic way for sampling data from a connection. If performance is an issue with queries coming from a Redshift environment I highly suggest exploring some of the tuning techniques listed in the joint Tableau and Amazon Whitepaper.
*http://www.tableausoftware.com/learn/whitepapers/tuning-your-amazon-redshift-and-tableau-software-deployment-better-performance
Can I build analytics in tableau by connecting to a MDM source and Big data information from AWS cloud services? How are the keys and joins resolved?
Tableau Answer: Tableau helps both business and IT groups jointly keep data safe and secure inside organizations. MDM solutions often play a role in how this is accomplished and often differ depening on the technology, approach, and goal of the ogranization itself.
Any university teach about Tableau?Many Universities have started incorporating Tableau into their acamdemic programs for a variety of courses. In support of academic institutions using Tableau for learning environments, Tableau has started the "Tableau for Teching" program which allows any full time student (elementary, high school, collegiate) as well as instrcutors at fully acredited institutions to use Tableau for free.
* http://www.tableausoftware.com/academic
Is it possible to get what is the # of CPU on the Tableau server which was handling 23 million rows ?
Technical specification recommendations for Tableau Server implementations are readily available. * http://www.tableausoftware.com/products/techspecs
Is this how it looks for an end user or is this the admin interface?
The majority of the demonstration during the webinar was Tableau Desktop which would be considered the report author's view. Hosted Tableau server views designed purely for interactive consumption do not offer the creation aspect seen in Tableau Desktop. Please see the accompanying link that shows a final Tableau Server example.
*https://demodepot.tableausoftware.com/views/SecuritiesTechnical/1#1
Q&A Session Transcription (cont.)Question Answer(s) Resource(s)
Can you share a dashboard with another tableau professional desktop user without creating an extract? (by sharing the connection to redshift)?
Tableau allows for workbook files to be shared between Tableau Desktop users that do not require extracted data. The Tableau Workbook file (extension .twb) contatins the analytics, but no local data -just a memory of how to connect back up to Amazon Redshift.
*http://www.theinformationlab.co.uk/2013/12/02/tableau-file-types-and-extensions/
How does the speed of querying a dataset on Redshift compare with querying a Tableau Data Extract data source on a Tableau server?
Performance of Tableau queries against Amazon Redshift as a datasource vs. a Tableau Data Extract hosted on Tableau Server is totally dependent on the type of data and complexity of the query. From a scalability standpoint, Amazon Redshift may be the better choice for bigger datasets given it's ability to elastically provision more compute power.
If you are building out a data model for tableau dashboards, should you use vertical or horizontal data structures for your data marts?
Tableau works best with vertical data structures.
How mac version of tableau connects with Redshift?Tableau Desktop Professional for the Mac leverages the same ODBC based connection approach for working with Amazon Redshift as it does Tableau Desktop for Windows.
You can find the drivers here: http://www.tableausoftware.com/support/drivers
Do you have a testing version of Tableau? With some testing datasets, that would allow one to practice design dashboards and day to day analytics, please :)
For anyone interested in using Tableau to experiment with building visual analytics and leveraging Amazon Redshift, I highly recommend trying the AWS test drive page set up by Slalom Consulting.
* https://www.slalom.com/aws
We have a non performing platform hosted locally with slow response times when you interact in Tableau. Would simply putting the tableau extract on Redshift result in a boost in performance?
Tableau Answer: If the data environment your organization is using internally is slow or not set up for analytics, I would recommend looking into Amazon Redshift or RDS. Neither of these options would require you to even need to take a Tableau Data Extract. Tableau customer Mixpo, had almost exactly this scenario and saw tremendous results leveraging Redshift.
*http://www.tableausoftware.com/learn/webinars/explore-big-data-analytics-amazon-redshift
Can Tableau Server be clustered for HA ?Tableau Answer: Tableau can absolutely be clustered to ensure an HA (Highly Available) environment. No restrictions on the Tableau side but there are cursor limitations on the Redshift side. Please refer to the whitepaper for more details
*http://www.tableausoftware.com/sites/default/files/whitepapers/high_availablility_reduced_downtime.pdf
What are the challenges one can encounter while working with tableau on redshift?How complete is the integration of tableau and redshift? For example, will all the analytical functions that tableau generates in its SQL available in redshift?
Tableau Answer: Every organization's data and analytical requirements are unique. Knowing how to tune performanc in both Tableau and Redshift is very helpful and is covered in the joint Tableau and Amazon Redshift whitepaper
*http://www.tableausoftware.com/learn/whitepapers/tuning-your-amazon-redshift-and-tableau-software-deployment-better-performance
THANK YOU