HADOOP & SAS Data Loader for HADOOP...Nov 18, 2014  · SAS Forum Twitter Contest –Tweet to win...

39
Copyright © 2014, SAS Institute Inc. All rights reserved. Turning Data into Value Hadoop & SAS Data Loader for Hadoop Sebastiaan Schaap Frederik Vandenberghe

Transcript of HADOOP & SAS Data Loader for HADOOP...Nov 18, 2014  · SAS Forum Twitter Contest –Tweet to win...

Page 1: HADOOP & SAS Data Loader for HADOOP...Nov 18, 2014  · SAS Forum Twitter Contest –Tweet to win prizes! A. HDFS & Hive B. Cloudera & HDFS C. HDFS & MapReduce 5. Which are the 2 core

Copyright © 2014, SAS Institute Inc. All rights reserved.

Turning Data into Value

Hadoop & SAS Data Loader for HadoopSebastiaan SchaapFrederik Vandenberghe

Page 2: HADOOP & SAS Data Loader for HADOOP...Nov 18, 2014  · SAS Forum Twitter Contest –Tweet to win prizes! A. HDFS & Hive B. Cloudera & HDFS C. HDFS & MapReduce 5. Which are the 2 core

Copyright © 2014, SAS Institute Inc. All rights reserved.

What’s Hadoop

SAS Data management:Traditional In-Database In-Memory

The Hadoop analytics lifecycle

SAS Data Loader for Hadoop

Demo

Agenda

Page 3: HADOOP & SAS Data Loader for HADOOP...Nov 18, 2014  · SAS Forum Twitter Contest –Tweet to win prizes! A. HDFS & Hive B. Cloudera & HDFS C. HDFS & MapReduce 5. Which are the 2 core
Page 4: HADOOP & SAS Data Loader for HADOOP...Nov 18, 2014  · SAS Forum Twitter Contest –Tweet to win prizes! A. HDFS & Hive B. Cloudera & HDFS C. HDFS & MapReduce 5. Which are the 2 core

Copyright © 2014, SAS Institute Inc. All rights reserved.

Page 5: HADOOP & SAS Data Loader for HADOOP...Nov 18, 2014  · SAS Forum Twitter Contest –Tweet to win prizes! A. HDFS & Hive B. Cloudera & HDFS C. HDFS & MapReduce 5. Which are the 2 core

Copyright © 2014, SAS Institute Inc. All rights reserved.

Page 6: HADOOP & SAS Data Loader for HADOOP...Nov 18, 2014  · SAS Forum Twitter Contest –Tweet to win prizes! A. HDFS & Hive B. Cloudera & HDFS C. HDFS & MapReduce 5. Which are the 2 core

Copyright © 2014, SAS Institute Inc. All rights reserved.

Market trends

3 TB

TODAY $115

2010 $270

2005 $3,720

2000 $33,000

1995 $3,360,000

1990 $33,600,000

1985 $315,000,000

1980 $1,312,500,000

How much does this drive

cost?

Page 7: HADOOP & SAS Data Loader for HADOOP...Nov 18, 2014  · SAS Forum Twitter Contest –Tweet to win prizes! A. HDFS & Hive B. Cloudera & HDFS C. HDFS & MapReduce 5. Which are the 2 core

Copyright © 2014, SAS Institute Inc. All rights reserved.

Tech trends

4.17 hr.

2.5 min

15 sec

1 disk

100 disks

1000 disks

How long does it take to

read 3 TB?

3 TB

Page 8: HADOOP & SAS Data Loader for HADOOP...Nov 18, 2014  · SAS Forum Twitter Contest –Tweet to win prizes! A. HDFS & Hive B. Cloudera & HDFS C. HDFS & MapReduce 5. Which are the 2 core

Copyright © 2014, SAS Institute Inc. All rights reserved.

What is it?

• Distributed processing of large data sets across clusters of computers using simple programming models

• Single or multiples machines

• Data processing framework and a distributed file system for data storage (HDFS)

Page 9: HADOOP & SAS Data Loader for HADOOP...Nov 18, 2014  · SAS Forum Twitter Contest –Tweet to win prizes! A. HDFS & Hive B. Cloudera & HDFS C. HDFS & MapReduce 5. Which are the 2 core

Copyright © 2014, SAS Institute Inc. All rights reserved.

Data Store

SAS

Data

In-Database

Data Store

SAS

Traditional SAS

Even with In-Database processing there will still be some

work performed on the SAS server

Traditional vs. In-Database vs. In-memory

These approaches are complementary & can be combined for maximum effect

Data Store

SAS

Data

In-Memory

MemoryData

Even with In-Database processing there will still be some

work performed on the SAS server

Page 10: HADOOP & SAS Data Loader for HADOOP...Nov 18, 2014  · SAS Forum Twitter Contest –Tweet to win prizes! A. HDFS & Hive B. Cloudera & HDFS C. HDFS & MapReduce 5. Which are the 2 core

Copyright © 2014, SAS Institute Inc. All rights reserved.

SAS and Hadoop

SAS accesses and extracts data from Hadoop to a SAS server for processing,

and writes results back

SAS processes data directly in the Hadoop cluster

SAS accesses and processes Hadoop data on SAS Servers while keeping the data and

computations massively parallel

Page 11: HADOOP & SAS Data Loader for HADOOP...Nov 18, 2014  · SAS Forum Twitter Contest –Tweet to win prizes! A. HDFS & Hive B. Cloudera & HDFS C. HDFS & MapReduce 5. Which are the 2 core

Copyright © 2014, SAS Institute Inc. All rights reserved.

IDENTIFY /

FORMULATE

PROBLEM

DATA

PREPARATION

DATA

EXPLORATION

TRANSFORM

& SELECT

BUILD

MODEL

VALIDATE

MODEL

DEPLOY

MODEL

EVALUATE /

MONITOR

RESULTS

The Hadoop analytics lifecycle

SAS Visual Analytics

SAS Visual Statistics

SAS In-Memory Statistics for Hadoop

Done using either the Data

Preparation, Data Exploration

or Build Model Tools

SAS High Performance Analytics Offerings

supported by relevant clients like SAS

Enterprise Miner, SAS/STAT etc.

Done using the Build Model

Tools and other checks

SAS Scoring Accelerator for Hadoop

SAS Code Accelerator for Hadoop

SAS Visual Analytics

SAS/Access to Hadoop

SAS DI & Federation Server

SAS ESP

SAS Data Loader

SAS DI & Federation Server

SAS Data Loader

SAS DQ Accelerator for Hadoop

SAS Code Accelerator for Loader SAS/Access to Hadoop

Page 12: HADOOP & SAS Data Loader for HADOOP...Nov 18, 2014  · SAS Forum Twitter Contest –Tweet to win prizes! A. HDFS & Hive B. Cloudera & HDFS C. HDFS & MapReduce 5. Which are the 2 core

Copyright © 2014, SAS Institute Inc. All rights reserved.

TEXT

MANAGE

DATA

EX

PL

OR

E

DA

TA

DEVELOP

MODELS

DE

PL

OY

&

MO

NIT

OR

• SAS/ACCESS

• SAS Data Management

• SAS Federation Server

• SAS Event Stream Processing

• SAS Data Loader for Hadoop SAS Data Quality Accelerator for

Hadoop

SAS Code Accelerator for Hadoop

• SAS Data Loader for Hadoop

• SAS Visual Analytics

• SAS In-memory Statistics for Hadoop

• SAS Scoring Accelerator

for Hadoop

• SAS Decision Manager

• SAS Visual Analytics

The Hadoop analytics lifecycle

• SAS High Performance Analytics Products

• SAS Visual Statistics

• SAS In-memory Statistics for Hadoop

Page 13: HADOOP & SAS Data Loader for HADOOP...Nov 18, 2014  · SAS Forum Twitter Contest –Tweet to win prizes! A. HDFS & Hive B. Cloudera & HDFS C. HDFS & MapReduce 5. Which are the 2 core

Copyright © 2014, SAS Institute Inc. All rights reserved.

SAS Data Management Platform works seamlessly across Hadoop

Web Based DM

interface for

Hadoop

Self-service data

manipulation in

Hadoop + Loading

into LASR

SAS/Access to Hadoop, SAS/Access to Impala,

BASE SAS, SAS Federation Server

SA

S E

ve

nt S

tre

am

Pro

ce

ssin

g

En

gin

e

Access to HDFS,

Hadoop scripting (Pig,

Map Reduce…) and

HIVE/Cloudera Impala

through SAS coding and

GUI + Reuse of DQ and

ETL/ELT processing

SAS DI Studio

RDBMS

On-Hadoop data processing

Data virtualization &

masking across Hadoop

and other data stores

All other DM Clients

Hadoop Accelerated Clients BAU SAS DM clients

Third party clients +

SAS BI + SAS

Analytics + SAS

Solutions

Other clients

Bring streaming data from various sources

into Hadoop and/or the RDBMS or generate

events before data hits downstream store

Page 14: HADOOP & SAS Data Loader for HADOOP...Nov 18, 2014  · SAS Forum Twitter Contest –Tweet to win prizes! A. HDFS & Hive B. Cloudera & HDFS C. HDFS & MapReduce 5. Which are the 2 core

Copyright © 2014, SAS Institute Inc. All rights reserved.

SAS Data Loader for Hadoop

• Code Accelerator for Hadoop

• Data Quality Accelerator for Hadoop

• Data Loader, the UI

Scoring Accelerator for Hadoop• Separately licensed product

SAS IN-DATABASE FOR HADOOP

Page 15: HADOOP & SAS Data Loader for HADOOP...Nov 18, 2014  · SAS Forum Twitter Contest –Tweet to win prizes! A. HDFS & Hive B. Cloudera & HDFS C. HDFS & MapReduce 5. Which are the 2 core

Copyright © 2014, SAS Institute Inc. All rights reserved.

Sas® data loader for Hadoop

a new SAS Web-based Business user interface

Point & Click

User Menus

Little or no Hadoop

experience neededSelf-Service UI HTML 5 Interface

Enables Self-Service approach to managing data in Hadoop environment

Page 16: HADOOP & SAS Data Loader for HADOOP...Nov 18, 2014  · SAS Forum Twitter Contest –Tweet to win prizes! A. HDFS & Hive B. Cloudera & HDFS C. HDFS & MapReduce 5. Which are the 2 core

Copyright © 2014, SAS Institute Inc. All rights reserved.

Web Based Data Management interface for Hadoop

Capabilities

Browser-based + point and click

No knowledge of Hadoop or SAS is required)

Access and view data in Hadoop

Query, filter, transform, summarize the data

Load data into tables as well as SAS LASR

SAS Data Quality Accelerator for Hadoop

Benefits

self-service approach

enable the casual user

Improve data quality

Minimize movement of data

SAS Data Quality Accelerator for Hadoop and SAS Code Accelerator for Hadoop run in the Hadoop cluster

Page 17: HADOOP & SAS Data Loader for HADOOP...Nov 18, 2014  · SAS Forum Twitter Contest –Tweet to win prizes! A. HDFS & Hive B. Cloudera & HDFS C. HDFS & MapReduce 5. Which are the 2 core

Copyright © 2014, SAS Institute Inc. All rights reserved.

SAS® Data Loader for HadoopWhat is it?

Web-based interface

• Easy-to-use

• HTML5

Execute code on the Hadoop cluster• DS2, Hive and Data Quality

Load data into SAS LASR server

vApp

Page 18: HADOOP & SAS Data Loader for HADOOP...Nov 18, 2014  · SAS Forum Twitter Contest –Tweet to win prizes! A. HDFS & Hive B. Cloudera & HDFS C. HDFS & MapReduce 5. Which are the 2 core

Copyright © 2014, SAS Institute Inc. All rights reserved.

Non-IT or Business person

Easy to configure (small configuration list)

SAS® Data Loader for HadoopWhat is it?

Page 19: HADOOP & SAS Data Loader for HADOOP...Nov 18, 2014  · SAS Forum Twitter Contest –Tweet to win prizes! A. HDFS & Hive B. Cloudera & HDFS C. HDFS & MapReduce 5. Which are the 2 core

Copyright © 2014, SAS Institute Inc. All rights reserved.

vApp What is a vApp

vApp stands for virtual Application

Fully functional appliance containing a specific set of

SAS Software

Plug-and-Play environment

Some vApp examples : SAS University Edition, SAS Data

Loader and Visual Analytics 6,2 (Cloud only)

Page 20: HADOOP & SAS Data Loader for HADOOP...Nov 18, 2014  · SAS Forum Twitter Contest –Tweet to win prizes! A. HDFS & Hive B. Cloudera & HDFS C. HDFS & MapReduce 5. Which are the 2 core

Copyright © 2014, SAS Institute Inc. All rights reserved.

vApp

CPU

RAM

Storage

Network

Op

era

tin

gS

yste

m

Ap

plic

atio

ns

vApp Ledger

SAS Solution

Page 21: HADOOP & SAS Data Loader for HADOOP...Nov 18, 2014  · SAS Forum Twitter Contest –Tweet to win prizes! A. HDFS & Hive B. Cloudera & HDFS C. HDFS & MapReduce 5. Which are the 2 core

Copyright © 2014, SAS Institute Inc. All rights reserved.

vAppHow does it integrate with the rest of the environment?

Metadata

Instructions/queries

Desktop

SAS Data Loader

For Hadoop

Registers Loaded

LASR tables only

Instructions

Data

Page 22: HADOOP & SAS Data Loader for HADOOP...Nov 18, 2014  · SAS Forum Twitter Contest –Tweet to win prizes! A. HDFS & Hive B. Cloudera & HDFS C. HDFS & MapReduce 5. Which are the 2 core

Copyright © 2014, SAS Institute Inc. All rights reserved.

Laptop or desktop running Windows 7 (64-bit)

8 GB RAM minimum (16 GB preferred)

HyperThreading enabled in the BIOS (VT-x or AMD-v)

20 GB of free disk space

Capable of installing and running VMware 6 or 6+

Internet Explorer 9+, Firefox 14+, or Chrome 21+

Sas® data loader for Hadoop

Client-Side requirements

Page 23: HADOOP & SAS Data Loader for HADOOP...Nov 18, 2014  · SAS Forum Twitter Contest –Tweet to win prizes! A. HDFS & Hive B. Cloudera & HDFS C. HDFS & MapReduce 5. Which are the 2 core

Copyright © 2014, SAS Institute Inc. All rights reserved.

Installation

Pre-requisites Deploy Integrate Test

VMWare Player

SAS Software Depot

Hadoop Cluster

SAS Embedded Process

Firewall

Shared Folder

VM Configuration &

deploy

Startup

Apply SAS License

Application page

Hadoop configuration

inside the Data Loader

Optional : LASR

Configuration

Navigate in Hadoop

Do a transformation

Filter & query

Run SAS Code

Load to LASR

SAS® Data Loader for HadoopInstallation process

Page 24: HADOOP & SAS Data Loader for HADOOP...Nov 18, 2014  · SAS Forum Twitter Contest –Tweet to win prizes! A. HDFS & Hive B. Cloudera & HDFS C. HDFS & MapReduce 5. Which are the 2 core

Copyright © 2014, SAS Institute Inc. All rights reserved.

Page 25: HADOOP & SAS Data Loader for HADOOP...Nov 18, 2014  · SAS Forum Twitter Contest –Tweet to win prizes! A. HDFS & Hive B. Cloudera & HDFS C. HDFS & MapReduce 5. Which are the 2 core

Copyright © 2014, SAS Institute Inc. All rights reserved.

Key take-aways

SAS Data Management graphical user interfaces accelerate the

adoption of Hadoop

SAS Data Management provides the flexibility to work with Hadoop as

a new data store alongside traditional data stores using a single

platform

Existing SAS customers can leverage their SAS skills and existing data

management assets developed with SAS when using Hadoop

Page 26: HADOOP & SAS Data Loader for HADOOP...Nov 18, 2014  · SAS Forum Twitter Contest –Tweet to win prizes! A. HDFS & Hive B. Cloudera & HDFS C. HDFS & MapReduce 5. Which are the 2 core

Copyright © 2014, SAS Institute Inc. All rights reserved.

Turning Data into Value

Page 27: HADOOP & SAS Data Loader for HADOOP...Nov 18, 2014  · SAS Forum Twitter Contest –Tweet to win prizes! A. HDFS & Hive B. Cloudera & HDFS C. HDFS & MapReduce 5. Which are the 2 core

Copyright © 2014, SAS Institute Inc. All rights reserved.

Big Data + Hadoop =

Big Data Collection for the technical user

Big Data + Hadoop + SAS =

Accessibility for everybody in the organization

• Business users consume the big Hadoop data

• Business analysts explore & visualize

• Data Scientists develop and deploy analytical models

Decisions built on fact based analytical insights into all of the data

NEW workshop SAS & Hadoop, getting the value out of Big Data

18 Nov. 2014

All details on www.sas.com/belux/training

SAS & Hadoop, getting the value out of Big Data

Page 28: HADOOP & SAS Data Loader for HADOOP...Nov 18, 2014  · SAS Forum Twitter Contest –Tweet to win prizes! A. HDFS & Hive B. Cloudera & HDFS C. HDFS & MapReduce 5. Which are the 2 core

Copyright © 2014, SAS Institute Inc. All rights reserved.

Big Data + Hadoop =

Big Data Collection for the technical user

Big Data + Hadoop + SAS =

Accessibility for everybody in the organization

• Business users consume the big Hadoop data

• Business analysts explore & visualize

• Data Scientists develop and deploy analytical models

Decisions built on fact based analytical insights into all of the data

NEW workshop SAS & Hadoop, getting the value out of Big Data

18 Nov. 2014

All details on www.sas.com/belux/training

SAS & Hadoop, getting the value out of Big Data

Page 29: HADOOP & SAS Data Loader for HADOOP...Nov 18, 2014  · SAS Forum Twitter Contest –Tweet to win prizes! A. HDFS & Hive B. Cloudera & HDFS C. HDFS & MapReduce 5. Which are the 2 core

Twitter Contest – Tweet to win prizes!SAS Forum

A. HDFS & Hive

B. Cloudera & HDFS

C. HDFS & MapReduce

5. Which are the 2 core components of every Hadoop installation?

Tweet your answer:

Example: @spicyanalytics 5X

Prizes to win:

1st prize: a ticket for Analytics 2015

2nd prize: a book of Prof Bart Baesens: “Analytics in a big

data world”

3rd to 30th prize: chocolates with pepper

Winners will be contacted post-Forum !

Start of your tweet Question # Your answer

Page 30: HADOOP & SAS Data Loader for HADOOP...Nov 18, 2014  · SAS Forum Twitter Contest –Tweet to win prizes! A. HDFS & Hive B. Cloudera & HDFS C. HDFS & MapReduce 5. Which are the 2 core

Copyright © 2014, SAS Institute Inc. All rights reserved.

Turning Data into Value

Page 31: HADOOP & SAS Data Loader for HADOOP...Nov 18, 2014  · SAS Forum Twitter Contest –Tweet to win prizes! A. HDFS & Hive B. Cloudera & HDFS C. HDFS & MapReduce 5. Which are the 2 core

Copyright © 2014, SAS Institute Inc. All rights reserved.

SAS® Data Loader for HadoopA new SAS Web-based Business user interface

Point & Click

User Menus

Little or no Hadoop

experience neededSelf-Service UI HTML 5 Interface

Enables Self-Service approach to managing data in Hadoop environment

Page 32: HADOOP & SAS Data Loader for HADOOP...Nov 18, 2014  · SAS Forum Twitter Contest –Tweet to win prizes! A. HDFS & Hive B. Cloudera & HDFS C. HDFS & MapReduce 5. Which are the 2 core

Copyright © 2014, SAS Institute Inc. All rights reserved.

SAS® Data Loader for HadoopTransform Data in Hadoop

Filtering RulesColumn

SelectionsAggregation

No coding, scripting or specialized skills required

Page 33: HADOOP & SAS Data Loader for HADOOP...Nov 18, 2014  · SAS Forum Twitter Contest –Tweet to win prizes! A. HDFS & Hive B. Cloudera & HDFS C. HDFS & MapReduce 5. Which are the 2 core

Copyright © 2014, SAS Institute Inc. All rights reserved.

SAS® Data Loader for HadoopQuery Hadoop data

Select

Source Tables

Apply Query

Criteria

See subset of data in

Table Viewer

Simple Drag & Drop approach to Query Data inside Hadoop

Page 34: HADOOP & SAS Data Loader for HADOOP...Nov 18, 2014  · SAS Forum Twitter Contest –Tweet to win prizes! A. HDFS & Hive B. Cloudera & HDFS C. HDFS & MapReduce 5. Which are the 2 core

Copyright © 2014, SAS Institute Inc. All rights reserved.

SAS® Data Loader for HadoopProfile Hadoop Data

Select

Source Table

View Reports in

Column Display

Run standard metrics on data inside Hadoop and generate reports

View Reports in

Table Display

Page 35: HADOOP & SAS Data Loader for HADOOP...Nov 18, 2014  · SAS Forum Twitter Contest –Tweet to win prizes! A. HDFS & Hive B. Cloudera & HDFS C. HDFS & MapReduce 5. Which are the 2 core

Copyright © 2014, SAS Institute Inc. All rights reserved.

Page 36: HADOOP & SAS Data Loader for HADOOP...Nov 18, 2014  · SAS Forum Twitter Contest –Tweet to win prizes! A. HDFS & Hive B. Cloudera & HDFS C. HDFS & MapReduce 5. Which are the 2 core

Copyright © 2014, SAS Institute Inc. All rights reserved.

Page 37: HADOOP & SAS Data Loader for HADOOP...Nov 18, 2014  · SAS Forum Twitter Contest –Tweet to win prizes! A. HDFS & Hive B. Cloudera & HDFS C. HDFS & MapReduce 5. Which are the 2 core

Copyright © 2014, SAS Institute Inc. All rights reserved.

View Data

Page 38: HADOOP & SAS Data Loader for HADOOP...Nov 18, 2014  · SAS Forum Twitter Contest –Tweet to win prizes! A. HDFS & Hive B. Cloudera & HDFS C. HDFS & MapReduce 5. Which are the 2 core

Copyright © 2014, SAS Institute Inc. All rights reserved.

SAS® Data Loader for HadoopCopy Data to distributed sas® lasr server

Select

Source Table

Explore Hadoop data quickly and easily for faster insights

Copy Data To distributed

SAS® LASR Servers

SAS® Visual Analytics

Optional

Visualize Data

Page 39: HADOOP & SAS Data Loader for HADOOP...Nov 18, 2014  · SAS Forum Twitter Contest –Tweet to win prizes! A. HDFS & Hive B. Cloudera & HDFS C. HDFS & MapReduce 5. Which are the 2 core

Copyright © 2014, SAS Institute Inc. All rights reserved.