SAS pre Big Data · Data Mart Analytic Mart Analytic Mart BI and Analytics Unstructured,...
Transcript of SAS pre Big Data · Data Mart Analytic Mart Analytic Mart BI and Analytics Unstructured,...
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
SAS FOR BIG DATA
PRESENTED BY: BRAD HATHAWAY
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
SAS AND
BIG DATASOME KEY TAKEAWAYS FROM THE VIDEO
• Combining Big Data and Analytics
• Hadoop allows capturing unlimited amounts of diverse data – many
companies are using this to create a “Data Lake”
• Extracting value from the lake requires analytics which makes SAS a
natural complement to Hadoop
One thing the video didn’t mention:
the longer the data stays in the
Data Lake, the better your
performance and overall experience
will be.
It is critical to have as much
processing in Hadoop as possible.
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
SAS BUSINESS
ANALYTICS
FRAMEWORK
... GIVES SAS CUSTOMERS THE POWER TO KNOW!
• Each area is a market on its own!
• SAS is ranked as a leader in pretty
much all of them!
• Our customers are now shifting their
attention to how each of these areas
interact with Hadoop!
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
AGENDA
• What is Hadoop? (a quick refresher)
• Two Hadoop Approaches
• Data Platforms with Hadoop
• BI & Analytics on Hadoop
• SAS on Hadoop – a taste of
technology
• Data Quality Accelerator on Hadoop
• Self-Service DI on Hadoop
• SAS Visual Statistics
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
WHAT IS HADOOP?
A QUICK REFRESHER
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
WHAT IS HADOOP? DICTIONARY DEFINITION
“Hadoop is one way of using a set of cheap
computers to store an enormous amount of data
and then to process that data in parallel."
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
WHAT IS HADOOP? MAKING HADOOP EASY AND ENTERPRISE READY…
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
WHAT IS HADOOP?AS A DATA PLATFORM, STORAGE COSTS ARE MUCH
LOWER…
$0,00
$2 000 000,00
$4 000 000,00
$6 000 000,00
$8 000 000,00
$10 000 000,00
$12 000 000,00
$14 000 000,00
$16 000 000,00
$18 000 000,00
1 10 100 1000
Tota
l Co
st
Number of Gigabytes
Hadoop
Teradata Warehouse Appliance
Oracle Exadata
IBM Netezza
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
PROJECTS FOR THE HADOOP STACKWHAT IS HADOOP?
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
SUPPORTING
EVIDENCETHE TREND IS UP!
Source: SandHill Group, Do You Hadoop? A Survey of Big Data Practitioners October 29, 2013
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
IDENTIFY /
FORMULATE
PROBLEM
DATA
PREPARATION
DATA
EXPLORATION
TRANSFORM
& SELECT
BUILD
MODEL
VALIDATE
MODEL
DEPLOY
MODEL
EVALUATE /
MONITOR
RESULTS
TWO STARTING
POINTS
NOT MUTUALLY EXCLUSIVE… BUT OFTEN NOT SEEN TOGETHER!
Hadoop as a Data Platform(standalone or as part of a broader ecosystem)
Hadoop as a core component of the next
generation of BI and Analytics
.. to support innovative business usage.. to support an IT Transformation
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
WHERE WE ARE
TODAY?SETTING THE SCENE
• Operational Data Sources:
• Traditional sources include ERP, CRM and
financial systems amongst others.
• Evolving sources that include unstructured
data from places like Twitter, LinkedIn etc.
and streaming data from the Internet of
Things (sensors etc.)
Operational
Data Sources
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
WHERE WE ARE
TODAY?SETTING THE SCENE
Operational
Data Sources
EDW
Data Mart
Data Mart
Analytic
MartAnalytic
Mart
BI and
Analytics
Unstructured, Semi-structured and
Streaming data (i.e. sensor data) often
handled outside the Warehouse flow
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
WHERE DOES
HADOOP FIT?HADOOP AS A “NEW DATA” STORE
Operational
Data Sources
EDW
Data Mart
Data Mart
Analytic
MartAnalytic
Mart
BI and
Analytics
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
WHERE DOES
HADOOP FIT?HADOOP AS AN ADDITIONAL INPUT TO THE EDW
Operational
Data Sources
EDW
Data Mart
Data Mart
Analytic
MartAnalytic
Mart
Analytic
Mart
Data Mart
BI and
Analytics
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
WHERE DOES
HADOOP FIT?
HADOOP DATA PLATFORM AS A BASIS FOR BI AND
ANALYTICS
Operational
Data Sources
EDW
Analytic
Mart
Data Mart
Data Mart
Data Mart
Analytic
MartAnalytic
Mart
BI and
Analytics
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
WHERE DOES
HADOOP FIT?
HADOOP DATA PLATFORM AS A “STAGING LAYER” AS
PART OF A “DATA LAKE” – Downstream stores could be
Hadoop, data appliances or an RDBMS
Data Mart
Operational
Data Sources EDW
Data Mart
Analytic
MartAnalytic
Mart
BI and
Analytics
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
BASE SAS• Map Reduce + Pig Scripting + HDFS Commands
SAS/Access to Hadoop• Hive, Hive2 + Direct file access
SAS/Access to Impala (Cloudera only)
SAS Data Integration Studio (Transforms) in Data Management
Standard / Advanced:
SAS Federation ServerVirtual and secure access to Hadoop and more traditional sources
SAS Event Stream Processing EngineTo bring streaming data from Sensors into Hadoop
HIGH LEVEL VIEWWHAT YOU CAN DO WITH SAS AND HADOOP WHEN IT
COMES TO USING “HADOOP AS A DATA PLATFORM”
Today... Coming very soon
• Read/Write HDFS files
• Submit HiveQL code
• Execute Map/Reduce code
• Submit Pig Latin
• Transfer data to/from Hadoop using Hadoop utilities
• SQL transforms pushed down with Access to Hadoop
engine
Everything we have today plus...
SAS Data Quality Accelerator for Hadoop
- Execute selected DQ routines in Hadoop
SAS Code Accelerator for Hadoop
- Execute SAS DS2 code in Hadoop
New Web Based Business User Interface
• Point and click data
management routines where
data stays in Hadoop
• HTML 5 Web based interface
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
WHEN IT COMES TO
BI / REPORTINGTWO SIMPLE THINGS TO REMEMBER
Data for data visualization, and reporting sourced from
Hadoop but the actual visualization / reporting is not
running on Hadoop
Hadoop cluster processors
used for data visualization,
exploration and reporting
SAS/Access just like we do
with an RDBMSIn-Memory
More or less business as usual Transformational
A B
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
WHEN IT COMES TO
BI / REPORTING
WHAT YOU CAN DO WITH SAS AND HADOOP WHEN IT
COMES TO USING “HADOOP” AS PART OF BI
Hadoop cluster processors
used for data visualization,
exploration and reporting
Any SAS BI Product:
• SAS Visual Analytics
• SAS Office Analytics
• SAS Enterprise Guide
• SAS BI/EBI Server
• SAS Stored Processes and batch
programs for reporting
In-Memory Exploration,
Visualization & Reporting
• SAS Visual AnalyticsA B
Data for data visualization, and reporting sourced from
Hadoop but the actual visualization / reporting is not
running on Hadoop
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
WHEN IT COMES TO
ANALYTICSTHREE SIMPLE THINGS TO REMEMBER
Data for Analytics sourced from Hadoop but no
Analytics running on Hadoop
Hadoop cluster processors
used for Analytical Computation
Analytics deployed for
batch execution in Hadoop
Think SAS/Access just like
we do with an RDBMSThink In-Database just
like with an RDBMS
Think In-Memory
Analytics
More or less business as usual Transformational Operational
C D E
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
WHEN IT COMES TO
ANALYTICS
WHAT YOU CAN DO WITH SAS AND HADOOP WHEN IT
COMES TO USING “HADOOP” AS PART OF ANALYTICS
Data for Analytics sourced from Hadoop but no
Analytics running on Hadoop
Hadoop cluster processors
used for Analytical Computation
Analytics deployed for
batch execution on Hadoop
Any SAS Analytics Product:
• SAS Enterprise Miner
• SAS Forecast Server
• SAS/STAT etc.
In-Memory Interactive
Analytics
• SAS Visual Statistics
• SAS In-Memory
Statistics for Hadoop
Operational Analytics
• SAS Scoring
Accelerator for Hadoop
• SAS Code Accelerator
for Hadoop
C D E
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
IDENTIFY /
FORMULATE
PROBLEM
DATA
PREPARATION
DATA
EXPLORATION
TRANSFORM
& SELECT
BUILD
MODEL
VALIDATE
MODEL
DEPLOY
MODEL
EVALUATE /
MONITOR
RESULTS
THE ANALYTICS
LIFECYCLESTRATEGY: ENABLE THE ENTIRE LIFECYCLE ON HADOOP
SAS Visual Analytics
SAS Visual Statistics
SAS In-Memory Statistics for Hadoop
Done using either the Data
Preparation, Data Exploration
or Build Model Tools
SAS High Performance Analytics Offerings
supported by relevant clients like SAS
Enterprise Miner, SAS/STAT etc.
Done using the Build Model
Tools and other checks
SAS Scoring Accelerator for Hadoop
SAS Code Accelerator for Hadoop
SAS Visual Analytics
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
SAS ON HADOOP
A TASTE OF TECHNOLOGY
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
SAS DI STUDIO FLOW INCLUDING HADOOP DATA
ORACLE
DB2
SAS
SAPAccess Hadoop Combine with other data,
Transform & Load
HADOOP
TERADATA
SAS FEDERATION SERVER
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
SAS DI STUDIO MANAGE DATA IN HADOOP STANDALONE
Creating new data in
HadoopTransform data inside
Hadoop using HiveQL
Access data in
Hadoop
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
Harness the power of the Hadoop
distributed platform, big data, and SAS
data management capabilities
High performance in-database processing
Native capabilities (HiveQL, Pig, MR)
+
Value-Added capabilities
• SAS Code Accelerator
• SAS Data Quality Accelerator
Embedded into Hadoop
Hadoop Cluster
DATA MANAGEMENT
FOR HADOOPHIGH PERFORMANCE IN-HADOOP DATA PROCESSING
HDFS /
Raw Files
MapReduce, Pig,
HiveQLSAS Code
AcceleratorSAS Data Quality
Accelerator
SAS Servers
SAS LEVERAGES HADOOP FOR
MAXIMUM PERFORMANCE
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
DATA MANAGEMENT
FOR HADOOPSELF-SERVICE DATA QUERY AND TRANSFORMATION
Hadoop Cluster
New SAS Web-Based
Business User Interface
Users are able to manage big data
• Query, Select, Filter, Summarize & Transform data
• Use data quality
• Load data into SAS LASR
SAS Data Quality
Accelerator
SAS Code
Accelerator
HiveQL, Pig,
MapReduce
Feature preview: https://www.youtube.com/watch?v=6-9zcKQjCUs
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
SAS®
VISUAL
STATISTICS 6.4
EXTENDING SAS VISUAL ANALYTICS FOR MORE ANALYTIC
CONTROL AND TARGETED ACTIONS
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
Advanced Modeling
Techniques
Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
IN SUMMARYSAS BUSINESS ANALYTICS FRAMEWORK
... GIVES OUR CUSTOMERS THE POWER TO KNOW!
SAS does this with support of Hadoop in all core areas –
this is unique to SAS!
Any business use case
you can think of will
need all of these!