Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Machine Learning | AWS...

Greg Khairallah, Business Development Manager, AWS

Malini Saxena, Senior Consultant, AWS

Raj Chary, VP of Technology / Architecture, WagglePractice

Lige Hensley, Chief Technology Officer, Ivy Tech

June 20, 2016

Easy Analytics with AWS

What to expect from this session

• AWS toolkit for analytics

• Understand stakeholders

• Demo

• Case Study – WagglePractice

• Case Study – Ivy Tech

• Q&A

AnalyzeStore

Amazon

Glacier

Amazon

DynamoDB

Amazon RDS,

Amazon Aurora

Big data portfolio—but what do I recommend?

AWS Data Pipeline

Amazon

CloudSearch

Amazon EMR

Amazon EC2

Amazon

Redshift

Amazon

Machine

Learning

Amazon

Elasticsearch

Service

AWS Database

Migration

Amazon

Kinesis

Analytics

Amazon Kinesis

Firehose

AWS Import/Export

AWS Direct

Connect

Collect

Amazon Kinesis

Streams Amazon

QuickSight

Match toolset to right persona

• Business intelligence (BI) analyst

• Primary tool is SQL

• Historical data resides in data warehouse such as

Amazon Redshift

• Data scientist—Uses programmatic languages such as R or

Python

• Application developer—Requires API to integrate with AWS

services

BI analyst

BI analyst with existing BI tools

BI Analyst

BI tools

Amazon EC2

Amazon Redshift

QuickSight API

• Primary tool is SQL

• Data is largely structured with well known data sources

• Primary concern is fast, consistent performance

• Need to extend SQL with custom functions

BI tools

Amazon EC2

Amazon QuickSight

Amazon Redshift system architecture

Leader node• SQL endpoint

• Stores metadata

• Coordinates query execution

Compute nodes• Local, columnar storage

• Execute queries in parallel

• Load, backup, restore via Amazon S3; load from Amazon DynamoDB, Amazon EMR, or SSH

Two hardware platforms• Optimized for data processing

• DS2: HDD; scale from 2 TB to 2 PB

• DC1: SSD; scale from 160 GB to 356 TB

10 GigE

JDBC/ODBC

New SQL functions

We add SQL functions regularly to expand Amazon Redshift’s query capabilities

Added 25+ window and aggregate functions since launch, including:

LISTAGG

[APPROXIMATE] COUNT

DROP IF EXISTS, CREATE IF NOT EXISTS

REGEXP_SUBSTR, _COUNT, _INSTR, _REPLACE

PERCENTILE_CONT, _DISC, MEDIAN

PERCENT_RANK, RATIO_TO_REPORT

We’ll continue iterating but also want to enable you to write your own

Window function examples: http://docs.aws.amazon.com/redshift/latest/dg/r_Window_function_examples.html

Scalar user defined functions

You can write UDFs using Python 2.7

• Syntax is largely identical to PostgreSQL UDF

• Python execution is performed in parallel

• System and network calls within UDFs are prohibited

Comes integrated with Pandas, NumPy, SciPy, DateUtil, and

Pytz analytic libraries

• Import your own libraries for even more flexibility

• Take advantage of thousands of functions available through Python

libraries to perform operations not easily expressed in SQL

A very fast, cloud-powered, business

intelligence service for 1/10 the cost of

traditional BI software

What is Amazon QuickSight?

Business

QuickSight

APIQuickSight UI

Mobile Devices Web Browsers

Partner BI Products

MetadataData PrepConnectors SuggestionsSPICE

Amazon

Kinesis

Amazon

DynamoDB

Amazon EMRAmazon

RedshiftAmazon RDSFiles Third-party

Data scientist

Data scientist with existing toolsets

Data scientistToolkits like SAS or

R Studio installed

with Amazon EC2

Unstructured data

Amazon S3

Structured data

Amazon Redshift

• Work with unstructured datasets

• Use existing toolsets to connect to Amazon Redshift

Querying Amazon Redshift with R packages

• RJDBC—Supports SQL queries

• dplyr—Uses R code for data

analysis

• RPostgreSQL—R compliant

driver or database Interface (DBI)R UserR Studio

Amazon

Unstructured data

Amazon S3

User profile

Amazon RDS

Amazon Redshift

Connecting R with Amazon Redshift blog post: https://blogs.aws.amazon.com/bigdata/post/Tx1G8828SPGX3PK/Connecting-R-with-Amazon-Redshift

Querying Amazon Redshift with R packages example

Application developer

Application developers can build smart

applications using Amazon Machine Learning

Structured data/predictions

Amazon Redshift

Generate/query

predictions

Amazon QuickSight

Application

Amazon Machine

Learning

Visualize

• All skill levels

• Amazon Machine Learning technology is accessed through APIs and SDKs

• Embed visualizations in applications

Raj Chary, WagglePracticeVice President of Technology/Architecture

Smart, responsive practice

Math and ELA (Grades 2-8)

Provides students the right

challenge at the right time

What is Waggle?

Right Challenge, Right Time

Waggle looks for more than

correct answers. Waggle

continually analyzes each

student’s decisions and

progress. That way, students get

tougher material right when

they’re ready.

What is Waggle?

Productive Struggle

Waggle motivates students to

push themselves forward. How?

Through helpful hints,

supportive feedback, and

achievement badges that build

grit and confidence.

What is Waggle?

Constructive Grouping Waggle’s

insights means you can easily

group students together based

on learning needs. All without

sacrificing the quality of

individual instruction.

What is Waggle?

Waggle: Product Demo

• Data Creators Differentiated learning experience

Fun and engaging

• Data Visualizers Seamless integration with application

Analytics with a Story

Actionable Data

Redshift: Data Warehouse Layout

Write ClusterCompute – dw2.large

Redshift

Read ClusterCompute – dw2.large

Redshift

History ClusterDensity – dw1.xlarge

Redshift

Initial and Incremental {processed} data loads

Periodic Data Snapshots for historical analysis

Data sources

For serving Jaspersoft reports

S3 COPY

S3 UnLoad and Load

Data mart(aggregations)

NodesNodes

Staging

Datamart(aggregations)

NodesS3 UnLoad and Load

S3 UnLoad and Load + UPSERTS

Results and Lessons Learned• Performance Metrics

– Millions of records are processed in <1 minute

• LOAD/UNLOAD commands | UPSERTS | S3 COPY Command – Report queries average < 1 to ~1.5 seconds

– {compression} – gained 20+% efficiencies in data retrieval

• Best Practices

– {sort keys} – lens-based data model: visualize data in variety of ways

– {commit stats} – Redshift is not a transactional system

– {nested loops} – no Cartesian products, ensure joins well managed

– {queries that queue} – tune the WLM configuration

– {query runtimes} – faster query means less queuing

– {stats missing} – analyze and vacuum when possible

– {alerts with tables} – monitor to ensure queries running optimally

Thank You

Ivy Tech & Amazon Redshift

May 25, 2016

• Transforming the culture of the College to be more data driven

• Moving from reporting silos to an Integrated Analytics system, we call

this a Data Democracy

• Collecting and analyzing a vast variety of data at a scale that no one

in Higher Ed is doing

• Using machine learning tools to identify students who may need

further assistance

• Starting this fall, we are implementing a one-on-one coaching

initiative for the students we identified with the machine learning tools

What We’re Doing

96% of organizations in the United States

use data in the same way.

…and it’s wrong.

But it’s not just education…

The “Standard” Approach

Relevant Data for Everyone

Data Regimes

Data Dictatorship: Data is controlled and its use is restricted. There is asymmetric distribution of information based on your position

Data Aristocracy: Data analysts, scientists and PhDs are needed to do anything meaningful. Power concentrates in the hands of these employees and their supervisors

Data Anarchy: Business users feel underserved and take matters into their own hands. They create “shadow IT” systems and work around the “unresponsive” IT group

Data Democracy: Everybody gets timely and equitable access to data. Line of business users are empowered and “own” the data. Executives and IT get out of the way

1 Shash Hegde, Mariner, “The Rise of Data Regimes”, 9/12/13, http://www.mariner-usa.com/rise-data-regimes/ (image substitution for Mao Zedong)

Every organization moves through increasingly complex stages of data accessibility.

Data Maturity Model

… very few complete the transition to Integrated Analytics

Stage 1: Report SilosRequest

Tracker

Banner Blackboard Luminis StarfishSCCM CAS

Authentication

This is what we have had for

decades at Ivy Tech…

Request

Tracker

Authentication

Stage 2: Data Warehousing

This is what

companies

but we are

taking this a

step further…

Stage 3: Integrated AnalyticsRequest

Tracker

Authentication

Students by

Financial

Students

Classes by

Section

Students

These curated collections of

data are designed to enable

direct access to...

…the data you need, regardless of

where it came from. Quickly.

Easily.

GPA Graduation—Cumulative

Graduation Grade Point Average (Cumulative) is an indication of a student's academic progress for all

semester credit classes for all registered terms up to and including the selected term. Letter grades are

assigned points (A=4, B=3, C=2, D=1, F=0) and the GPA is calculated by taking the number of grade

points a student earned in a selected period of time divided by the total number of classes taken during

that same period.

GPA Graduation Cumulative = Sum of a student's total grade points earned in credit classes for all

classes for all registered terms up to and including the selected term / Sum of student's total classes

taken during that same period

NOTES ON USING THIS TERM: GPA Graduation - Cumulative does not include grades from remedial

classes.

Related Terms: [GPA Graduation - Term]

Questions?

Resources

Amazon Redshift Getting Started Guide:

http://docs.aws.amazon.com/redshift/latest/gsg/getting-started.html

Scalar UDF Documentation: http://docs.aws.amazon.com/redshift/latest/dg/user-defined-

functions.html

Introduction to Python UDFs in Amazon Redshift:

https://blogs.aws.amazon.com/bigdata/post/Tx1IHV1G67CY53T/Introduction-to-Python-UDFs-in-

Amazon-Redshift

Connecting R with Amazon Redshift:

https://blogs.aws.amazon.com/bigdata/post/Tx1G8828SPGX3PK/Connecting-R-with-Amazon-

Redshift

Databricks Apache Spark–Amazon Redshift Tutorial: https://github.com/databricks/spark-

redshift/tree/master/tutorial

Amazon ML Getting Started Guide: https://aws.amazon.com/machine-learning/getting-started/

Amazon QuickSight (Preview Registration): https://aws.amazon.com/quicksight/

Thank you!

Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Machine Learning | AWS...

Technology

Transcript of Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Machine Learning | AWS...

Amazon Web Services - Automation Summit · Amazon Cognito AWS CodeDeploy AWS Personal Health Dashboard AWS Snowmobile Lambda * As of 1 September 2017 AWS Codebuild AWS X-Ray Amazon

Amazon AWS appliance (Enterprise AWS) Quick Start Guide

Amazon Web Services - AWS Documentation · PDF fileApplication Auto Scaling ... AWS Security Token Service (AWS STS) ... Amazon Web Services General Reference Amazon Polly

AWS Foundation Service Introduction - ntut.edu.tjykuo/course/AWS-Training-Deck-Day1.pdf · Amazon EC2 Container Service Amazon Cognito AWS CodeDeploy Amazon CloudSearch Amazon WorkMail

Drobo Amazon Aws

Getting Enterprises on Track with Advanced Analytics · AWS Dynamo AWS Batch AWS CloudTrail Amazon Redshift AWS Quicksight Reporting & dashboards BUSINESS INTELLIGENCE DATA ACCESS

Amazon RDS - AWS

Yinlin Chen, Jim Tuttle, William A. Ingram Information ...AWS Lambda Amazon DynamoDB Amazon CloudFront Amazon Route 53 Amazon CloudWatch AWS CloudTrail AWS CloudFormation IAM Amazon

AWS Certified Solutions Architect –Associate (SAA-C01) · 2019-08-09 · AWS Device Farm. AWS Web App Firewall. Amazon Elasticsearch Service. Amazon QuickSight. AWS Import/Export

Implementation of Streaming Analytics Using Amazon Web ... · • Amazon QuickSight – For cloud-based visualization to deliver real-time insights derived from Kinesis Analytics.

Amazon AWS Services Overview

Amazon Web Services - isppref.com€¦ · Amazon RDS Amazon VPC AWS Auto Scaling AWS Elastic Load Balancing 2010 Amazon SNS AWS Identity & Access Management Amazon Route 53 2011 Amazon

How to deploy DSH-PREDMNT in proprietary AWS account · AWS services by class AWS Lambda Amazon DynamoDB Amazon CloudWatch AWS CloudFormation AWS IoT Core AWS IoT Greengrass Amazon

Overview of Amazon Web Services - AWS Whitepaper · Amazon QuickSight Amazon QuickSight is a fast, cloud-powered business analytics service that makes it easy to build visualizations,

Amazon AWS Deploy

DevOps at Amazon: A Look at Our Tools and Processes · AWS Snowball AWS Organizations Device Farm Amazon Config Amazon RDS for Aurora WorkDocs AWS Snowball Edge CodeCommit AWS CodePipeline

Module 4: Secure your cloud applications...AWS Certificate Manager Amazon Cloud Directory AWS CloudHSM Amazon Cognito AWS Directory Service AWS Firewall Manager Amazon GuardDuty AWS

Guide Amazon QuickSight Connector - Informatica · The Informatica Cloud Amazon QuickSight Connector Guide contains information about how to set up and 5. Informatica Cloud Connector

Table of Contents - AWS Documentation · 2020-08-03 · QuickSight dashboard: – An entity which identiﬁes QuickSight reports, created from analyses or templates. QuickSight dashboards

Module 4: Secure your cloud applicationsQ...Amazon Cognito AWS Directory Service AWS Firewall Manager Amazon GuardDuty AWS Identity and Access Management Amazon Inspector AWS Key Management