WELCOME TO: WHAT, WHEN, WHY OF SAS /ACCESS

45
Copyright © 2015, SAS Institute Inc. All rights reserved. WELCOME TO: WHAT, WHEN, WHY OF SAS ® /ACCESS Presented by Jeff Simpson SAS Customer Loyalty

Transcript of WELCOME TO: WHAT, WHEN, WHY OF SAS /ACCESS

Copyr i g ht © 2015, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

WELCOME TO:

WHAT, WHEN, WHY OF SAS®/ACCESS

Presented by

Jeff Simpson

SAS Customer Loyalty

Copyr i g ht © 2015, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

By the end of this meeting, you will understand the key characteristics,

capabilities, and efficiencies of SAS/ACCESS interfaces.

What are SAS/ACCESS interfaces?

What capabilities do they provide?

When should they be used? Why?

Performance tips and hints

(Hadoop, ODBC, Oracle, Netezza, SQL Server, Teradata)

Recommended Resources

Goal and takeaways

Copyr i g ht © 2015, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

With In-Database

data

Remember this

SAS

Conduct as much in-database

processing as possible

so your analytics can run faster.

Copyr i g ht © 2015, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

Why use SAS/ACCESS interfaces?

So your analytics can consume and disseminate diverse data sources

and targets

Copyr i g ht © 2015, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

Factors to Consider DBMS SAS dataset

operating system any any

purpose transactional analytics

concurrent / multi-user yes yes

language SQL SAS & SQL

method (non)sequential reads (non)sequential reads

tenure long established long established (1976)

scalable yes yes

Distinguishing DBMS and SAS datasets

SAS/ACCESS®

delimited

What data can Base SAS read from / write to?

(without SAS/ACCESS)

flat file

XML

mainframe VSAM,

EBCDIC

ASCII

JMP

Copyr i g ht © 2015, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

Aster Data

DB2

Cloudera Impala

Greenplum

IBM PureData

MYSQL

ODBC and OLE DB

Oracle & Oracle Exadata

PC file formats

PostgreSQL (including Amazon Redshift)

SAP HANA

Hadoop

SQL Server

Sybase

Teradata

others

What are SAS/ACCESS interfaces?Conduct read/write operations to/from SAS

Copyr i g ht © 2015, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

APPEND

INSERT

LOAD / FAST LOAD

READ

UPDATE

WRITE

What operations can SAS/ACCESS interfaces perform?

When should we use SAS/ACCESS interfaces?

Copyr i g ht © 2015, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

Why not just write SQL?

A rank example

WITH "subquery0" ( "COSTPRICE_PER_UNIT", "DISCOUNT", "ORDER_ID", "ORDER_ITEM_NUM",

"PRODUCT_ID", "QUANTITY", "TOTAL_RETAIL_PRICE" ) AS ( SELECT "COSTPRICE_PER_UNIT", "DISCOUNT",

"ORDER_ID", "ORDER_ITEM_NUM", "PRODUCT_ID", "QUANTITY", "TOTAL_RETAIL_PRICE" FROM

"DB2_ORDER_ITEM" ) SELECT "table0"."ORDER_ID", "table0"."ORDER_ITEM_NUM",

"table0"."PRODUCT_ID", "table0"."QUANTITY", "table0"."TOTAL_RETAIL_PRICE",

"table0"."COSTPRICE_PER_UNIT", "table0"."DISCOUNT", "table2"."rankalias1" AS "QUANTITYRANK",

"table1"."rankalias0" AS "PRODUCTRANK" FROM "subquery0" AS "table0" LEFT JOIN ( SELECT DISTINCT

"PRODUCT_ID", "tempcol0" AS "rankalias0" FROM ( SELECT "PRODUCT_ID", MIN( "tempcol1" ) OVER (

PARTITION BY "PRODUCT_ID" ) AS "tempcol0" FROM ( SELECT "PRODUCT_ID", CAST( ROW_NUMBER() OVER (

ORDER BY "PRODUCT_ID" DESC ) AS DOUBLE PRECISION ) AS "tempcol1" FROM "subquery0" WHERE ( (

"PRODUCT_ID" IS NOT NULL ) ) ) AS "subquery2" ) AS "subquery1" ) AS "table1" ON ( (

"table0"."PRODUCT_ID" = "table1"."PRODUCT_ID" ) ) LEFT JOIN ( SELECT DISTINCT "QUANTITY",

"tempcol2" AS "rankalias1" FROM ( SELECT "QUANTITY", MIN( "tempcol3" ) OVER ( PARTITION BY

"QUANTITY" ) AS "tempcol2" FROM ( SELECT "QUANTITY", CAST( ROW_NUMBER() OVER ( ORDER BY

"QUANTITY" DESC ) AS DOUBLE PRECISION ) AS "tempcol3" FROM "subquery0" WHERE ( ( "QUANTITY" IS

NOT NULL ) ) ) AS "subquery4" ) AS "subquery3" ) AS "table2" ON ( ( "table0"."QUANTITY" =

"table2"."QUANTITY" ) )

Copyr i g ht © 2015, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

Why not just write SQL?

Because writing SAS is shorter, faster, easier to maintain

proc RANK example

proc rank data=indb2.db2_order_item out=work.order descending ties=low;

var quantity product_id;

ranks QuantityRank ProductRank;

run;

PROC RANK can also run in-database

Copyr i g ht © 2015, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

It depends

Part of a package, like SAS Office Analytics

A la carte / individually (Base, SAS/STAT, SAS/ACCESS)

both

How are SAS/ACCESS interfaces licensed?

Copyr i g ht © 2015, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

ODBC drivers connect SAS and other technologies to/from any ODBC-enabled

data source/target

Microsoft and others provide Windows ODBC drivers free or at a minimal cost

ODBC drivers in non-Windows environments can be costly

SAS/ACCESS

Interface to ODBC

ODBC drivers come with

your database or purchase

separately

Distinguishing ODBC & Database-Specific SAS/ACCESS Interfaces

1 of 2

Copyr i g ht © 2015, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

SAS/ACCESS Interface to ODBC SAS/ACCESS Interface to

[Oracle, Teradata, DB2, etc.]

data

SAS

data

Distinguishing ODBC & Database-Specific SAS/ACCESS Interfaces

2 of 2

ODBC

driver

ODBC program interface

SAS

DBMS client installed and

configured

Copyr i g ht © 2015, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

Factors to Consider ODBC OLE DB

operating system Windows & Unix Windows

multidimensional data support no yes

concurrent / multi-user no yes

method SQL multiple

terminology driver provider

tenure long established newer

costsmore low/no cost resources on

Windowsfewer low/no cost resources

More details: http://ftp.sas.com/techsup/download/v8papers/odbcdb.pdf

Distinguishing ODBC and OLE DB

Copyr i g ht © 2015, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

Database SAS-supplied Driver? ODBC-Based?

Aster, Impala, Informix, Netezza, ODBC, Sybase IQ, Vertica

no yes

DB2, Hadoop, MySQL, Oracle, Sybase, Teradata

no no

Greenplum, PostgreSQL, SAP HANA, SQL Server

yes yes

PC Files not applicable not applicable

DBMS Requirements and Configuration Notes:

http://support.sas.com/documentation/installcenter/en/ikfdtnunxcg/66380/PDF/default/config.pdf

System Requirements Notes:

http://support.sas.com/documentation/installcenter/en/ikfdtnlaxsr/66396/PDF/default/sreq.pdf

Distinguishing ODBC and SAS-Supplied Drivers

Copyr i g ht © 2015, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

Support for Hadoop

FOUNDATION SAS

Foundation SAS offers support for Hadoop through

Base SAS

SAS/Access Interface to Hadoop (Hive)

SAS/Access Interface to HAWQ

SAS/Access Interface to Impala

Copyr i g ht © 2015, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

Copyr i g ht © 2015, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

What about Amazon Redshift?

SAS supports Amazon Redshift via SAS/ACCESS interface to ODBC.

An ODBC Driver is available from Amazon. Information about it can be found here:

http://docs.aws.amazon.com/redshift/latest/mgmt/install-odbc-driver-linux.html

http://docs.aws.amazon.com/redshift/latest/mgmt/odbc-driver-configure-linux-mac.html

Once this ODBC driver is installed and configured, then the SAS/ACCESS to ODBC interface engine is

able to connect to it: http://support.sas.com/documentation/cdl/en/acreldb/67589/HTML/default/viewer.htm#p1g72kbb0m01y1n1gm1l

h532n5ru.htm

This SAS Global Forum paper elaborates more details about how these technologies operate together:

http://support.sas.com/resources/papers/proceedings15/SAS1789-2015.pdf

Copyr i g ht © 2015, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

Without In-DatabaseWith In-Database

data

SAS

data

Conduct as much in-database processing as possible

Distinguishing Traditional Processing & In-Database

program interface

SAS

Copyr i g ht © 2015, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

SAS/ACCESS Interface

data

SAS

Avoid heterogeneous joins

Hadoop

ODBC

Oracle

Teradata

SAS dataset

program interface

Join takes place on SAS server

ALL data moves to SAS first

SAS extracts, queries, summarizes…

Your results may cause more data

movement…

Copyr i g ht © 2015, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

Homogeneous

LIBNAME MTG ‘/sas/data/mortgage/’;

LIBNAME HPI

‘/sas/data/housing_data/’;

PROC SQL;

CREATE TABLE MTG.MYDATA AS

SELECT M.LTV, H.CURR_PROP_AMT

FROM MTG.MORTGAGE_DATA AS M

JOIN HPI.HOUSING_INDEX AS H

ON M.ACCT_NUM = H.ACCT_NUM;

QUIT;

Heterogeneous

LIBNAME MTG ‘/sas/data/mortgage/’;

LIBNAME DRI_DBO Teradata

Datasrc=DRI_CITY SCHEMA=dbo

USER=&userid PASSWORD=&pwd;

PROC SQL;

CREATE TABLE MTG.MYDATA AS

SELECT M.LTV, D.REO_DATE

FROM MTG.MORTGAGE_DATA AS M

JOIN DRI_DBO.FLAT_REO AS D

ON M.ACCT_NUM = D.ACCT_NUM; QUIT;

Minimize Data Returned to SAS for Processing

Avoid heterogeneous or federated joins

Copyr i g ht © 2015, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

1. To merge SAS (or other) data with DBMS

• use pass-through SQL queries to process only the data you need on DBMS

• save the results to a SAS dataset

• merge all other SAS datasets with the newly created dataset

+ creates a homogeneous SAS data environment

+ you may not have to know DB-specific SQL

- can be inefficient; sacrifices some in-database processing

2. To filter large amounts of DBMS data based on a smaller SAS (or other)

dataset

• load the smaller SAS (or other) dataset into DBMS

• use pass-through SQL queries to process in-database (filter before join)

+ creates a homogeneous DBMS data environment

+ can gain in-database processing efficiencies

- you may have to know DB-specific SQL

Avoid heterogeneous joins

Copyr i g ht © 2015, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

With In-Database

data

Remember this

SAS

Conduct as much in-database

processing as possible

so your analytics can run faster.

Copyr i g ht © 2015, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

When both of these conditions occur:

1. EG detects a DBMS table/view via a SAS® library assigned through one of these

native SAS®/ACCESS® interfaces.

SAS®/ACCESS® Interface to DB2

SAS®/ACCESS® Interface to Oracle

SAS®/ACCESS® Interface to Netezza

SAS®/ACCESS® Interface to Teradata

AND

2. You reference DBMS source / input data in the query builder or task filter

How to conduct in-database processing via SAS® Enterprise

Guide® (EG)?

1 of 3SAS® in-database

Copyr i g ht © 2015, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

SAS® in-database

EG generates explicit SQL or pass-through SQL when working with a DBMS table / view.

Invoke via the Query Builder Options Options for This Query

How to conduct in-database processing via SAS® Enterprise

Guide® (EG)?

2 of 3

Copyr i g ht © 2015, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

SAS® in-database

Task filters apply selection criteria to wizard-driven tasks.

Task filters boost efficiency by avoiding a separate query or filter step.

Using task filters with DBMS causes EG to generate a WHERE clause for in-DB processing.

How to conduct in-database processing via SAS® Enterprise

Guide® (EG)?

3 of 3

Copyr i g ht © 2015, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

By the end of this meeting, you will understand the key characteristics and

capabilities of SAS/ACCESS interfaces.

What are SAS/ACCESS interfaces?

What capabilities do they provide?

When should they be used? Why?

Performance tips and hints

(Hadoop, ODBC, Oracle, Netezza, SQL Server, Teradata)

Recommended Resources

Goal and takeaways

Copyr i g ht © 2015, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

It depends

Implicit SQL = DBMS options on libname statement

• SAS creates a connection to the DBMS

• SAS translates your code into implicit SQL

+ you don’t have to know DB-specific SQL

- can be inefficient; all or portions may not translate

Explicit SQL = DBMS options on CONNECT statement + DB-specific SQL • SAS creates a connection to the DBMS

• You submit DBMS-specific explicit SQL to the DBMS

- you have to know DB-specific SQL

+ guarantees in-DB efficiency / no translation

Distinguishing Implicit and Explicit / Pass-Through SQL

Copyr i g ht © 2015, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

Pass-Through SQL enables the DBMS to optimize queries, especially when:

▪ querying, filtering, joining

▪ summarizing (such as AVG and COUNT, GROUP BY clauses)

▪ deriving variables that are created by expressions

Pass-through accepts the extensions to SQL that are provided by your DBMS

Distinguishing Implicit and Explicit / Pass-Through SQL

Copyr i g ht © 2015, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

http://support.sas.com/documentation/cdl/en/acreldb/63647/HTML/default/viewer.htm#a000433982.htm

http://support.sas.com/resources/papers/proceedings11/306-2011.pdf

Use implicit SQL

The LIBNAME statement must point to DBMS

Turn on sastrace

Copyr i g ht © 2015, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

Copyr i g ht © 2015, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

1) SAS language that has no database equivalent is processed in SAS

- does not minimize data that is returned to SAS

Avoid using SAS capabilities (functions) if they cannot be passed

to the database.

2) SAS functions that have database equivalents can process in-database

- function mapping and implicit pass-through

3) Database functions process in database

- explicit pass-through

Distinguishing Implicit and Explicit / Pass-Through SQL

Copyr i g ht © 2015, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

#1- SAS language that has no database equivalent is processed in SAS- does not minimize data that is returned to SAS, does not run in-database

Copyr i g ht © 2015, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

#2 - SAS functions that have database equivalents can process in-database- function mapping and implicit pass-through

Teradata

SAS

SQL

code

Translation by

SAS/ACCESS

Interface to

Teradata

Teradata

SQL

code

Copyr i g ht © 2015, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

#2 - SAS functions that have database equivalents can process in-DB

- function mapping and implicit pass-through

SAS functions (indicated with *) are implicitly passed to Teradata

Copyr i g ht © 2015, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

Teradata

#3 - Database functions process in database

- explicit pass-through

Teradata

SQL

code

Passed verbatim by

SAS/ACCESS

Interface to Teradata

Teradata

SQL

code

Copyr i g ht © 2015, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

Teradata

#3 - Database functions process in database- explicit pass-through

Teradata

SQL

code

Passed verbatim by

SAS/ACCESS

Interface to Teradata

Teradata

SQL

code

libname myterlib teradata user=myusr1;

proc sql;

select customer from myterlib.customers where upper(country)="USA";

quit;

The Teradata UPPER function is used instead of SAS UPCASE function for explicit pass-through.

Copyr i g ht © 2015, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

Mapping SAS Functions to DBMS Functions

http://support.sas.com/documentation/cdl

/en/acreldb/66787/HTML/default/viewer.h

tm#p0f64yzzxbsg8un1uwgstc6fivjd.htm

Copyr i g ht © 2015, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

With In-Database

data

Remember this

SAS

Conduct as much in-database

processing as possible

so your analytics can run faster.

Copyr i g ht © 2015, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

Use In-database Procedures

• PROC FREQ

• PROC MEANS

• PROC RANK

• PROC SQL

• PROC SORT

• PROC REPORT

• PROC SUMMARY

• PROC TABULATE

• Base SAS

• SAS/ACCESS to DBMS

• SQLGENERATION option

and LIBNAME statement

Aster

DB2

Greenplum

Hadoop

Netezza

Oracle

Teradata

Copyr i g ht © 2015, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

The LIBNAME statement must point to DBMS.

The SQLGENERATION system option or the SQLGENERATION LIBNAME

option must be set to DBMS.

▪ By default, the SQLGENERATION system option is set to NONE and the

Base SAS in-database procedures DO NOT RUN IN THE DATABASE.

▪ Conventional SAS processing is also used when specific procedure

statements and options do not support in-database processing.

http://support.sas.com/documentation/cdl/en/lesysoptsref/66899/HTML/default/viewer.htm#n1ag2fud7u

e3aln1xiqqtev7ergg.htm

http://support.sas.com/documentation/cdl/en/hostwin/63047/HTML/default/viewer.htm#p0drw76qo0gig

2n1kcoliekh605k.htm

Use In-database Procedures

Copyr i g ht © 2015, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

Why not just write SQL?

A rank example

WITH "subquery0" ( "COSTPRICE_PER_UNIT", "DISCOUNT", "ORDER_ID", "ORDER_ITEM_NUM",

"PRODUCT_ID", "QUANTITY", "TOTAL_RETAIL_PRICE" ) AS ( SELECT "COSTPRICE_PER_UNIT", "DISCOUNT",

"ORDER_ID", "ORDER_ITEM_NUM", "PRODUCT_ID", "QUANTITY", "TOTAL_RETAIL_PRICE" FROM

"DB2_ORDER_ITEM" ) SELECT "table0"."ORDER_ID", "table0"."ORDER_ITEM_NUM",

"table0"."PRODUCT_ID", "table0"."QUANTITY", "table0"."TOTAL_RETAIL_PRICE",

"table0"."COSTPRICE_PER_UNIT", "table0"."DISCOUNT", "table2"."rankalias1" AS "QUANTITYRANK",

"table1"."rankalias0" AS "PRODUCTRANK" FROM "subquery0" AS "table0" LEFT JOIN ( SELECT DISTINCT

"PRODUCT_ID", "tempcol0" AS "rankalias0" FROM ( SELECT "PRODUCT_ID", MIN( "tempcol1" ) OVER (

PARTITION BY "PRODUCT_ID" ) AS "tempcol0" FROM ( SELECT "PRODUCT_ID", CAST( ROW_NUMBER() OVER (

ORDER BY "PRODUCT_ID" DESC ) AS DOUBLE PRECISION ) AS "tempcol1" FROM "subquery0" WHERE ( (

"PRODUCT_ID" IS NOT NULL ) ) ) AS "subquery2" ) AS "subquery1" ) AS "table1" ON ( (

"table0"."PRODUCT_ID" = "table1"."PRODUCT_ID" ) ) LEFT JOIN ( SELECT DISTINCT "QUANTITY",

"tempcol2" AS "rankalias1" FROM ( SELECT "QUANTITY", MIN( "tempcol3" ) OVER ( PARTITION BY

"QUANTITY" ) AS "tempcol2" FROM ( SELECT "QUANTITY", CAST( ROW_NUMBER() OVER ( ORDER BY

"QUANTITY" DESC ) AS DOUBLE PRECISION ) AS "tempcol3" FROM "subquery0" WHERE ( ( "QUANTITY" IS

NOT NULL ) ) ) AS "subquery4" ) AS "subquery3" ) AS "table2" ON ( ( "table0"."QUANTITY" =

"table2"."QUANTITY" ) )

Copyr i g ht © 2015, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

Use In-database Procedures

Copyr i g ht © 2015, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

With In-Database

data

Remember this

SAS

Conduct as much in-database

processing as possible so your

analytics can run faster.

Use implicit or explicit / pass-

through SQL plus 7 in-DB Base

procedures.

Minimize data returned to SAS.

Avoid heterogeneous joins.

Copyr i g ht © 2015, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

Video

http://www.youtube.com/watch?v=OSTa1EUpKT8

Training

https://support.sas.com/edu/prodcourses.html?code=A

CCESS&ctry=US

Iterative Programming In-Database Using SAS®

Enterprise Guide® Query Builder

http://support.sas.com/resources/papers/proceedings1

4/1567-2014.pdf

Overview, documentation, training, samples and

tips, conversations

http://support.sas.com/software/products/access/index

.html#s1=1

Recommended

Resources