HADOOP & SAS Data Loader for HADOOP...Nov 18, 2014  · SAS Forum Twitter Contest –Tweet to win...

Post on 03-Jun-2020

4 views 0 download

Transcript of HADOOP & SAS Data Loader for HADOOP...Nov 18, 2014  · SAS Forum Twitter Contest –Tweet to win...

Copyright © 2014, SAS Institute Inc. All rights reserved.

Turning Data into Value

Hadoop & SAS Data Loader for HadoopSebastiaan SchaapFrederik Vandenberghe

Copyright © 2014, SAS Institute Inc. All rights reserved.

What’s Hadoop

SAS Data management:Traditional In-Database In-Memory

The Hadoop analytics lifecycle

SAS Data Loader for Hadoop

Demo

Agenda

Copyright © 2014, SAS Institute Inc. All rights reserved.

Copyright © 2014, SAS Institute Inc. All rights reserved.

Copyright © 2014, SAS Institute Inc. All rights reserved.

Market trends

3 TB

TODAY $115

2010 $270

2005 $3,720

2000 $33,000

1995 $3,360,000

1990 $33,600,000

1985 $315,000,000

1980 $1,312,500,000

How much does this drive

cost?

Copyright © 2014, SAS Institute Inc. All rights reserved.

Tech trends

4.17 hr.

2.5 min

15 sec

1 disk

100 disks

1000 disks

How long does it take to

read 3 TB?

3 TB

Copyright © 2014, SAS Institute Inc. All rights reserved.

What is it?

• Distributed processing of large data sets across clusters of computers using simple programming models

• Single or multiples machines

• Data processing framework and a distributed file system for data storage (HDFS)

Copyright © 2014, SAS Institute Inc. All rights reserved.

Data Store

SAS

Data

In-Database

Data Store

SAS

Traditional SAS

Even with In-Database processing there will still be some

work performed on the SAS server

Traditional vs. In-Database vs. In-memory

These approaches are complementary & can be combined for maximum effect

Data Store

SAS

Data

In-Memory

MemoryData

Even with In-Database processing there will still be some

work performed on the SAS server

Copyright © 2014, SAS Institute Inc. All rights reserved.

SAS and Hadoop

SAS accesses and extracts data from Hadoop to a SAS server for processing,

and writes results back

SAS processes data directly in the Hadoop cluster

SAS accesses and processes Hadoop data on SAS Servers while keeping the data and

computations massively parallel

Copyright © 2014, SAS Institute Inc. All rights reserved.

IDENTIFY /

FORMULATE

PROBLEM

DATA

PREPARATION

DATA

EXPLORATION

TRANSFORM

& SELECT

BUILD

MODEL

VALIDATE

MODEL

DEPLOY

MODEL

EVALUATE /

MONITOR

RESULTS

The Hadoop analytics lifecycle

SAS Visual Analytics

SAS Visual Statistics

SAS In-Memory Statistics for Hadoop

Done using either the Data

Preparation, Data Exploration

or Build Model Tools

SAS High Performance Analytics Offerings

supported by relevant clients like SAS

Enterprise Miner, SAS/STAT etc.

Done using the Build Model

Tools and other checks

SAS Scoring Accelerator for Hadoop

SAS Code Accelerator for Hadoop

SAS Visual Analytics

SAS/Access to Hadoop

SAS DI & Federation Server

SAS ESP

SAS Data Loader

SAS DI & Federation Server

SAS Data Loader

SAS DQ Accelerator for Hadoop

SAS Code Accelerator for Loader SAS/Access to Hadoop

Copyright © 2014, SAS Institute Inc. All rights reserved.

TEXT

MANAGE

DATA

EX

PL

OR

E

DA

TA

DEVELOP

MODELS

DE

PL

OY

&

MO

NIT

OR

• SAS/ACCESS

• SAS Data Management

• SAS Federation Server

• SAS Event Stream Processing

• SAS Data Loader for Hadoop SAS Data Quality Accelerator for

Hadoop

SAS Code Accelerator for Hadoop

• SAS Data Loader for Hadoop

• SAS Visual Analytics

• SAS In-memory Statistics for Hadoop

• SAS Scoring Accelerator

for Hadoop

• SAS Decision Manager

• SAS Visual Analytics

The Hadoop analytics lifecycle

• SAS High Performance Analytics Products

• SAS Visual Statistics

• SAS In-memory Statistics for Hadoop

Copyright © 2014, SAS Institute Inc. All rights reserved.

SAS Data Management Platform works seamlessly across Hadoop

Web Based DM

interface for

Hadoop

Self-service data

manipulation in

Hadoop + Loading

into LASR

SAS/Access to Hadoop, SAS/Access to Impala,

BASE SAS, SAS Federation Server

SA

S E

ve

nt S

tre

am

Pro

ce

ssin

g

En

gin

e

Access to HDFS,

Hadoop scripting (Pig,

Map Reduce…) and

HIVE/Cloudera Impala

through SAS coding and

GUI + Reuse of DQ and

ETL/ELT processing

SAS DI Studio

RDBMS

On-Hadoop data processing

Data virtualization &

masking across Hadoop

and other data stores

All other DM Clients

Hadoop Accelerated Clients BAU SAS DM clients

Third party clients +

SAS BI + SAS

Analytics + SAS

Solutions

Other clients

Bring streaming data from various sources

into Hadoop and/or the RDBMS or generate

events before data hits downstream store

Copyright © 2014, SAS Institute Inc. All rights reserved.

SAS Data Loader for Hadoop

• Code Accelerator for Hadoop

• Data Quality Accelerator for Hadoop

• Data Loader, the UI

Scoring Accelerator for Hadoop• Separately licensed product

SAS IN-DATABASE FOR HADOOP

Copyright © 2014, SAS Institute Inc. All rights reserved.

Sas® data loader for Hadoop

a new SAS Web-based Business user interface

Point & Click

User Menus

Little or no Hadoop

experience neededSelf-Service UI HTML 5 Interface

Enables Self-Service approach to managing data in Hadoop environment

Copyright © 2014, SAS Institute Inc. All rights reserved.

Web Based Data Management interface for Hadoop

Capabilities

Browser-based + point and click

No knowledge of Hadoop or SAS is required)

Access and view data in Hadoop

Query, filter, transform, summarize the data

Load data into tables as well as SAS LASR

SAS Data Quality Accelerator for Hadoop

Benefits

self-service approach

enable the casual user

Improve data quality

Minimize movement of data

SAS Data Quality Accelerator for Hadoop and SAS Code Accelerator for Hadoop run in the Hadoop cluster

Copyright © 2014, SAS Institute Inc. All rights reserved.

SAS® Data Loader for HadoopWhat is it?

Web-based interface

• Easy-to-use

• HTML5

Execute code on the Hadoop cluster• DS2, Hive and Data Quality

Load data into SAS LASR server

vApp

Copyright © 2014, SAS Institute Inc. All rights reserved.

Non-IT or Business person

Easy to configure (small configuration list)

SAS® Data Loader for HadoopWhat is it?

Copyright © 2014, SAS Institute Inc. All rights reserved.

vApp What is a vApp

vApp stands for virtual Application

Fully functional appliance containing a specific set of

SAS Software

Plug-and-Play environment

Some vApp examples : SAS University Edition, SAS Data

Loader and Visual Analytics 6,2 (Cloud only)

Copyright © 2014, SAS Institute Inc. All rights reserved.

vApp

CPU

RAM

Storage

Network

Op

era

tin

gS

yste

m

Ap

plic

atio

ns

vApp Ledger

SAS Solution

Copyright © 2014, SAS Institute Inc. All rights reserved.

vAppHow does it integrate with the rest of the environment?

Metadata

Instructions/queries

Desktop

SAS Data Loader

For Hadoop

Registers Loaded

LASR tables only

Instructions

Data

Copyright © 2014, SAS Institute Inc. All rights reserved.

Laptop or desktop running Windows 7 (64-bit)

8 GB RAM minimum (16 GB preferred)

HyperThreading enabled in the BIOS (VT-x or AMD-v)

20 GB of free disk space

Capable of installing and running VMware 6 or 6+

Internet Explorer 9+, Firefox 14+, or Chrome 21+

Sas® data loader for Hadoop

Client-Side requirements

Copyright © 2014, SAS Institute Inc. All rights reserved.

Installation

Pre-requisites Deploy Integrate Test

VMWare Player

SAS Software Depot

Hadoop Cluster

SAS Embedded Process

Firewall

Shared Folder

VM Configuration &

deploy

Startup

Apply SAS License

Application page

Hadoop configuration

inside the Data Loader

Optional : LASR

Configuration

Navigate in Hadoop

Do a transformation

Filter & query

Run SAS Code

Load to LASR

SAS® Data Loader for HadoopInstallation process

Copyright © 2014, SAS Institute Inc. All rights reserved.

Copyright © 2014, SAS Institute Inc. All rights reserved.

Key take-aways

SAS Data Management graphical user interfaces accelerate the

adoption of Hadoop

SAS Data Management provides the flexibility to work with Hadoop as

a new data store alongside traditional data stores using a single

platform

Existing SAS customers can leverage their SAS skills and existing data

management assets developed with SAS when using Hadoop

Copyright © 2014, SAS Institute Inc. All rights reserved.

Turning Data into Value

Copyright © 2014, SAS Institute Inc. All rights reserved.

Big Data + Hadoop =

Big Data Collection for the technical user

Big Data + Hadoop + SAS =

Accessibility for everybody in the organization

• Business users consume the big Hadoop data

• Business analysts explore & visualize

• Data Scientists develop and deploy analytical models

Decisions built on fact based analytical insights into all of the data

NEW workshop SAS & Hadoop, getting the value out of Big Data

18 Nov. 2014

All details on www.sas.com/belux/training

SAS & Hadoop, getting the value out of Big Data

Copyright © 2014, SAS Institute Inc. All rights reserved.

Big Data + Hadoop =

Big Data Collection for the technical user

Big Data + Hadoop + SAS =

Accessibility for everybody in the organization

• Business users consume the big Hadoop data

• Business analysts explore & visualize

• Data Scientists develop and deploy analytical models

Decisions built on fact based analytical insights into all of the data

NEW workshop SAS & Hadoop, getting the value out of Big Data

18 Nov. 2014

All details on www.sas.com/belux/training

SAS & Hadoop, getting the value out of Big Data

Twitter Contest – Tweet to win prizes!SAS Forum

A. HDFS & Hive

B. Cloudera & HDFS

C. HDFS & MapReduce

5. Which are the 2 core components of every Hadoop installation?

Tweet your answer:

Example: @spicyanalytics 5X

Prizes to win:

1st prize: a ticket for Analytics 2015

2nd prize: a book of Prof Bart Baesens: “Analytics in a big

data world”

3rd to 30th prize: chocolates with pepper

Winners will be contacted post-Forum !

Start of your tweet Question # Your answer

Copyright © 2014, SAS Institute Inc. All rights reserved.

Turning Data into Value

Copyright © 2014, SAS Institute Inc. All rights reserved.

SAS® Data Loader for HadoopA new SAS Web-based Business user interface

Point & Click

User Menus

Little or no Hadoop

experience neededSelf-Service UI HTML 5 Interface

Enables Self-Service approach to managing data in Hadoop environment

Copyright © 2014, SAS Institute Inc. All rights reserved.

SAS® Data Loader for HadoopTransform Data in Hadoop

Filtering RulesColumn

SelectionsAggregation

No coding, scripting or specialized skills required

Copyright © 2014, SAS Institute Inc. All rights reserved.

SAS® Data Loader for HadoopQuery Hadoop data

Select

Source Tables

Apply Query

Criteria

See subset of data in

Table Viewer

Simple Drag & Drop approach to Query Data inside Hadoop

Copyright © 2014, SAS Institute Inc. All rights reserved.

SAS® Data Loader for HadoopProfile Hadoop Data

Select

Source Table

View Reports in

Column Display

Run standard metrics on data inside Hadoop and generate reports

View Reports in

Table Display

Copyright © 2014, SAS Institute Inc. All rights reserved.

Copyright © 2014, SAS Institute Inc. All rights reserved.

Copyright © 2014, SAS Institute Inc. All rights reserved.

View Data

Copyright © 2014, SAS Institute Inc. All rights reserved.

SAS® Data Loader for HadoopCopy Data to distributed sas® lasr server

Select

Source Table

Explore Hadoop data quickly and easily for faster insights

Copy Data To distributed

SAS® LASR Servers

SAS® Visual Analytics

Optional

Visualize Data

Copyright © 2014, SAS Institute Inc. All rights reserved.