Introduction to Hadoop and HDFS. Table of Contents Hadoop – Overview Hadoop Cluster HDFS.
HADOOP & SAS Data Loader for HADOOP...Nov 18, 2014 · SAS Forum Twitter Contest –Tweet to win...
Transcript of HADOOP & SAS Data Loader for HADOOP...Nov 18, 2014 · SAS Forum Twitter Contest –Tweet to win...
Copyright © 2014, SAS Institute Inc. All rights reserved.
Turning Data into Value
Hadoop & SAS Data Loader for HadoopSebastiaan SchaapFrederik Vandenberghe
Copyright © 2014, SAS Institute Inc. All rights reserved.
What’s Hadoop
SAS Data management:Traditional In-Database In-Memory
The Hadoop analytics lifecycle
SAS Data Loader for Hadoop
Demo
Agenda
Copyright © 2014, SAS Institute Inc. All rights reserved.
Copyright © 2014, SAS Institute Inc. All rights reserved.
Copyright © 2014, SAS Institute Inc. All rights reserved.
Market trends
3 TB
TODAY $115
2010 $270
2005 $3,720
2000 $33,000
1995 $3,360,000
1990 $33,600,000
1985 $315,000,000
1980 $1,312,500,000
How much does this drive
cost?
Copyright © 2014, SAS Institute Inc. All rights reserved.
Tech trends
4.17 hr.
2.5 min
15 sec
1 disk
100 disks
1000 disks
How long does it take to
read 3 TB?
3 TB
Copyright © 2014, SAS Institute Inc. All rights reserved.
What is it?
• Distributed processing of large data sets across clusters of computers using simple programming models
• Single or multiples machines
• Data processing framework and a distributed file system for data storage (HDFS)
Copyright © 2014, SAS Institute Inc. All rights reserved.
Data Store
SAS
Data
In-Database
Data Store
SAS
Traditional SAS
Even with In-Database processing there will still be some
work performed on the SAS server
Traditional vs. In-Database vs. In-memory
These approaches are complementary & can be combined for maximum effect
Data Store
SAS
Data
In-Memory
MemoryData
Even with In-Database processing there will still be some
work performed on the SAS server
Copyright © 2014, SAS Institute Inc. All rights reserved.
SAS and Hadoop
SAS accesses and extracts data from Hadoop to a SAS server for processing,
and writes results back
SAS processes data directly in the Hadoop cluster
SAS accesses and processes Hadoop data on SAS Servers while keeping the data and
computations massively parallel
Copyright © 2014, SAS Institute Inc. All rights reserved.
IDENTIFY /
FORMULATE
PROBLEM
DATA
PREPARATION
DATA
EXPLORATION
TRANSFORM
& SELECT
BUILD
MODEL
VALIDATE
MODEL
DEPLOY
MODEL
EVALUATE /
MONITOR
RESULTS
The Hadoop analytics lifecycle
SAS Visual Analytics
SAS Visual Statistics
SAS In-Memory Statistics for Hadoop
Done using either the Data
Preparation, Data Exploration
or Build Model Tools
SAS High Performance Analytics Offerings
supported by relevant clients like SAS
Enterprise Miner, SAS/STAT etc.
Done using the Build Model
Tools and other checks
SAS Scoring Accelerator for Hadoop
SAS Code Accelerator for Hadoop
SAS Visual Analytics
SAS/Access to Hadoop
SAS DI & Federation Server
SAS ESP
SAS Data Loader
SAS DI & Federation Server
SAS Data Loader
SAS DQ Accelerator for Hadoop
SAS Code Accelerator for Loader SAS/Access to Hadoop
Copyright © 2014, SAS Institute Inc. All rights reserved.
TEXT
MANAGE
DATA
EX
PL
OR
E
DA
TA
DEVELOP
MODELS
DE
PL
OY
&
MO
NIT
OR
• SAS/ACCESS
• SAS Data Management
• SAS Federation Server
• SAS Event Stream Processing
• SAS Data Loader for Hadoop SAS Data Quality Accelerator for
Hadoop
SAS Code Accelerator for Hadoop
• SAS Data Loader for Hadoop
• SAS Visual Analytics
• SAS In-memory Statistics for Hadoop
• SAS Scoring Accelerator
for Hadoop
• SAS Decision Manager
• SAS Visual Analytics
The Hadoop analytics lifecycle
• SAS High Performance Analytics Products
• SAS Visual Statistics
• SAS In-memory Statistics for Hadoop
Copyright © 2014, SAS Institute Inc. All rights reserved.
SAS Data Management Platform works seamlessly across Hadoop
Web Based DM
interface for
Hadoop
Self-service data
manipulation in
Hadoop + Loading
into LASR
SAS/Access to Hadoop, SAS/Access to Impala,
BASE SAS, SAS Federation Server
SA
S E
ve
nt S
tre
am
Pro
ce
ssin
g
En
gin
e
Access to HDFS,
Hadoop scripting (Pig,
Map Reduce…) and
HIVE/Cloudera Impala
through SAS coding and
GUI + Reuse of DQ and
ETL/ELT processing
SAS DI Studio
RDBMS
On-Hadoop data processing
Data virtualization &
masking across Hadoop
and other data stores
All other DM Clients
Hadoop Accelerated Clients BAU SAS DM clients
Third party clients +
SAS BI + SAS
Analytics + SAS
Solutions
Other clients
Bring streaming data from various sources
into Hadoop and/or the RDBMS or generate
events before data hits downstream store
Copyright © 2014, SAS Institute Inc. All rights reserved.
SAS Data Loader for Hadoop
• Code Accelerator for Hadoop
• Data Quality Accelerator for Hadoop
• Data Loader, the UI
Scoring Accelerator for Hadoop• Separately licensed product
SAS IN-DATABASE FOR HADOOP
Copyright © 2014, SAS Institute Inc. All rights reserved.
Sas® data loader for Hadoop
a new SAS Web-based Business user interface
Point & Click
User Menus
Little or no Hadoop
experience neededSelf-Service UI HTML 5 Interface
Enables Self-Service approach to managing data in Hadoop environment
Copyright © 2014, SAS Institute Inc. All rights reserved.
Web Based Data Management interface for Hadoop
Capabilities
Browser-based + point and click
No knowledge of Hadoop or SAS is required)
Access and view data in Hadoop
Query, filter, transform, summarize the data
Load data into tables as well as SAS LASR
SAS Data Quality Accelerator for Hadoop
Benefits
self-service approach
enable the casual user
Improve data quality
Minimize movement of data
SAS Data Quality Accelerator for Hadoop and SAS Code Accelerator for Hadoop run in the Hadoop cluster
Copyright © 2014, SAS Institute Inc. All rights reserved.
SAS® Data Loader for HadoopWhat is it?
Web-based interface
• Easy-to-use
• HTML5
Execute code on the Hadoop cluster• DS2, Hive and Data Quality
Load data into SAS LASR server
vApp
Copyright © 2014, SAS Institute Inc. All rights reserved.
Non-IT or Business person
Easy to configure (small configuration list)
SAS® Data Loader for HadoopWhat is it?
Copyright © 2014, SAS Institute Inc. All rights reserved.
vApp What is a vApp
vApp stands for virtual Application
Fully functional appliance containing a specific set of
SAS Software
Plug-and-Play environment
Some vApp examples : SAS University Edition, SAS Data
Loader and Visual Analytics 6,2 (Cloud only)
Copyright © 2014, SAS Institute Inc. All rights reserved.
vApp
CPU
RAM
Storage
Network
Op
era
tin
gS
yste
m
Ap
plic
atio
ns
vApp Ledger
SAS Solution
Copyright © 2014, SAS Institute Inc. All rights reserved.
vAppHow does it integrate with the rest of the environment?
Metadata
Instructions/queries
Desktop
SAS Data Loader
For Hadoop
Registers Loaded
LASR tables only
Instructions
Data
Copyright © 2014, SAS Institute Inc. All rights reserved.
Laptop or desktop running Windows 7 (64-bit)
8 GB RAM minimum (16 GB preferred)
HyperThreading enabled in the BIOS (VT-x or AMD-v)
20 GB of free disk space
Capable of installing and running VMware 6 or 6+
Internet Explorer 9+, Firefox 14+, or Chrome 21+
Sas® data loader for Hadoop
Client-Side requirements
Copyright © 2014, SAS Institute Inc. All rights reserved.
Installation
Pre-requisites Deploy Integrate Test
VMWare Player
SAS Software Depot
Hadoop Cluster
SAS Embedded Process
Firewall
Shared Folder
VM Configuration &
deploy
Startup
Apply SAS License
Application page
Hadoop configuration
inside the Data Loader
Optional : LASR
Configuration
Navigate in Hadoop
Do a transformation
Filter & query
Run SAS Code
Load to LASR
SAS® Data Loader for HadoopInstallation process
Copyright © 2014, SAS Institute Inc. All rights reserved.
Copyright © 2014, SAS Institute Inc. All rights reserved.
Key take-aways
SAS Data Management graphical user interfaces accelerate the
adoption of Hadoop
SAS Data Management provides the flexibility to work with Hadoop as
a new data store alongside traditional data stores using a single
platform
Existing SAS customers can leverage their SAS skills and existing data
management assets developed with SAS when using Hadoop
Copyright © 2014, SAS Institute Inc. All rights reserved.
Turning Data into Value
Copyright © 2014, SAS Institute Inc. All rights reserved.
Big Data + Hadoop =
Big Data Collection for the technical user
Big Data + Hadoop + SAS =
Accessibility for everybody in the organization
• Business users consume the big Hadoop data
• Business analysts explore & visualize
• Data Scientists develop and deploy analytical models
Decisions built on fact based analytical insights into all of the data
NEW workshop SAS & Hadoop, getting the value out of Big Data
18 Nov. 2014
All details on www.sas.com/belux/training
SAS & Hadoop, getting the value out of Big Data
Copyright © 2014, SAS Institute Inc. All rights reserved.
Big Data + Hadoop =
Big Data Collection for the technical user
Big Data + Hadoop + SAS =
Accessibility for everybody in the organization
• Business users consume the big Hadoop data
• Business analysts explore & visualize
• Data Scientists develop and deploy analytical models
Decisions built on fact based analytical insights into all of the data
NEW workshop SAS & Hadoop, getting the value out of Big Data
18 Nov. 2014
All details on www.sas.com/belux/training
SAS & Hadoop, getting the value out of Big Data
Twitter Contest – Tweet to win prizes!SAS Forum
A. HDFS & Hive
B. Cloudera & HDFS
C. HDFS & MapReduce
5. Which are the 2 core components of every Hadoop installation?
Tweet your answer:
Example: @spicyanalytics 5X
Prizes to win:
1st prize: a ticket for Analytics 2015
2nd prize: a book of Prof Bart Baesens: “Analytics in a big
data world”
3rd to 30th prize: chocolates with pepper
Winners will be contacted post-Forum !
Start of your tweet Question # Your answer
Copyright © 2014, SAS Institute Inc. All rights reserved.
Turning Data into Value
Copyright © 2014, SAS Institute Inc. All rights reserved.
SAS® Data Loader for HadoopA new SAS Web-based Business user interface
Point & Click
User Menus
Little or no Hadoop
experience neededSelf-Service UI HTML 5 Interface
Enables Self-Service approach to managing data in Hadoop environment
Copyright © 2014, SAS Institute Inc. All rights reserved.
SAS® Data Loader for HadoopTransform Data in Hadoop
Filtering RulesColumn
SelectionsAggregation
No coding, scripting or specialized skills required
Copyright © 2014, SAS Institute Inc. All rights reserved.
SAS® Data Loader for HadoopQuery Hadoop data
Select
Source Tables
Apply Query
Criteria
See subset of data in
Table Viewer
Simple Drag & Drop approach to Query Data inside Hadoop
Copyright © 2014, SAS Institute Inc. All rights reserved.
SAS® Data Loader for HadoopProfile Hadoop Data
Select
Source Table
View Reports in
Column Display
Run standard metrics on data inside Hadoop and generate reports
View Reports in
Table Display
Copyright © 2014, SAS Institute Inc. All rights reserved.
Copyright © 2014, SAS Institute Inc. All rights reserved.
Copyright © 2014, SAS Institute Inc. All rights reserved.
View Data
Copyright © 2014, SAS Institute Inc. All rights reserved.
SAS® Data Loader for HadoopCopy Data to distributed sas® lasr server
Select
Source Table
Explore Hadoop data quickly and easily for faster insights
Copy Data To distributed
SAS® LASR Servers
SAS® Visual Analytics
Optional
Visualize Data
Copyright © 2014, SAS Institute Inc. All rights reserved.