Copyright © 2004, SAS Institute Inc. All rights reserved. Paul Kent VP SAS Platform Research &...

Post on 28-Dec-2015

213 views 0 download

Transcript of Copyright © 2004, SAS Institute Inc. All rights reserved. Paul Kent VP SAS Platform Research &...

Copyright © 2004, SAS Institute Inc. All rights reserved.

Paul KentVP SAS Platform Research & Development<kent@sas.com>

Forthcoming Changes in SAS

Copyright © 2004, SAS Institute Inc. All rights reserved.

Where do I come from?

New Hill, North Carolina

Y’all

Johannesburg, South Africa

Julle

Fareham, England

???

Copyright © 2004, SAS Institute Inc. All rights reserved.

R & D :: Loyal Employees

Copyright © 2004, SAS Institute Inc. All rights reserved.

R & D groups, and where I come from

Platform

Clients

Solutions• With Analytics

Copyright © 2004, SAS Institute Inc. All rights reserved.

R & D groups, and where I come from

Platform

Clients

Solutions• With Analytics

Copyright © 2004, SAS Institute Inc. All rights reserved.

What do we programmers do?

Gather Data

Organise Data

Arrange Data for consumption

Facilitate said consumption

Create understanding of Data

Promote understanding of said DataValue

Copyright © 2004, SAS Institute Inc. All rights reserved.

Power Reporting

Web Reporting

Information Delivery Framework

Information Consumers Domain Experts Power User

Business Analyst

InfoTech

Large% Small%

Web Report Viewing

Analytic Reporting

Who do we programmers do it for?Audience Continuum

Value

Copyright © 2004, SAS Institute Inc. All rights reserved.

Forthcoming Improvements in the SAS Foundation

ODS (and the new ODS statistical graphics)

SAS Database Storage capabilities

The Data Step and Proc SQL

Grid Computing Capabilities

Bits and Pieces

Copyright © 2004, SAS Institute Inc. All rights reserved.

ODS Statistical Graphics

Copyright © 2004, SAS Institute Inc. All rights reserved.

Survival Plot Using PROC LIFETEST in SAS 8

J. Zhou, NESUG 2002

Three-page SAS program with macros

Use GPLOT and GREPLAY for graphics

Statistical Metadata

Overlaid Curves

Copyright © 2004, SAS Institute Inc. All rights reserved.

Statistical Graphics

Essential for modern data analysis

Difficult to create in SAS prior to SAS 9• Context lost when statistical procedure terminates

• Programmer must recreate context, metadata

Statistical procedures should automatically create graphics

Follow the 80-20 rule – 20% of these might need further tweaking, but for the most part…

Copyright © 2004, SAS Institute Inc. All rights reserved.

Life Is Easier in SAS 9 …ods graphics on;

ods html file="lifetest.htm";

proc lifetest data=surv;

time surv*censor(1);

survival plots=(survival hwb);

strata trt;

id patient;

run;

ods html close;

ods graphics off;

Copyright © 2004, SAS Institute Inc. All rights reserved.

LIFETEST Procedure – Survival Plot

Copyright © 2004, SAS Institute Inc. All rights reserved.

LIFETEST Procedure – HWB plot

Copyright © 2004, SAS Institute Inc. All rights reserved.

Usage of ODS Statistical Graphics in SAS 9

Experimental in 30 SAS/STAT and SAS/ETS procedures - SAS 9.1

Automates creation of commonly used graphical displays for a particular analysis

Production in SAS 9.2

Copyright © 2004, SAS Institute Inc. All rights reserved.

PROC GLM

Copyright © 2004, SAS Institute Inc. All rights reserved.

PROC GLM (ANCOVA)

Copyright © 2004, SAS Institute Inc. All rights reserved.

GAM Procedure

Copyright © 2004, SAS Institute Inc. All rights reserved.

HPF Procedure

Copyright © 2004, SAS Institute Inc. All rights reserved.

KDE Procedure

Copyright © 2004, SAS Institute Inc. All rights reserved.

KDE Procedure

Copyright © 2004, SAS Institute Inc. All rights reserved.

LOGISTIC Procedure

Copyright © 2004, SAS Institute Inc. All rights reserved.

MIXED Procedure

Copyright © 2004, SAS Institute Inc. All rights reserved.

MIXED Procedure

Copyright © 2004, SAS Institute Inc. All rights reserved.

PHREG Procedure

Copyright © 2004, SAS Institute Inc. All rights reserved.

PLS Procedure

Copyright © 2004, SAS Institute Inc. All rights reserved.

PRINCOMP Procedure

Copyright © 2004, SAS Institute Inc. All rights reserved.

REG Procedure

Copyright © 2004, SAS Institute Inc. All rights reserved.

TIMESERIES Procedure

Copyright © 2004, SAS Institute Inc. All rights reserved.

UCM Procedure

Copyright © 2004, SAS Institute Inc. All rights reserved.

Integration with ODS Styles Over 30 different styles

New style elements for statistical graphics• Fitted line

• Confidence lines and bands

• Prediction Lines

• Outliers

• Classification groups

Copyright © 2004, SAS Institute Inc. All rights reserved.

Style Demonstrationods html file=“robustreg.htm” style=journal;

ods graphics on;

title “Journal Style”;proc robustreg data=mydata plot=all;

model y = x1 x2 x3;run;

ods html close;

Journal Analysis Default Statistical

(only Summary Statistics and Residual Histogram output shown)

Copyright © 2004, SAS Institute Inc. All rights reserved.

Summary

Goal is to automate creation of graphics by statistical procedures• Minimum work for user

• Maximum built-in functionality

Experimental in SAS 9.1

Production in SAS 9.2

Copyright © 2004, SAS Institute Inc. All rights reserved.

SAS Transactional Storage(aka SAS Database Capabilities)

Demo Time

1. Color_table• Remember to start your TableServer

2. Customers• Remember to start your AppServer (tomcat5)

Copyright © 2004, SAS Institute Inc. All rights reserved.

SAS Transactional Storage(aka SAS Database Capabilities)

A more traditional Database Capability

From SAS. (not oracle, ibm, or microsoft)

Based on OpenSource “Firebird”

Real Datatypes – INT, MONEY, VARCHAR

Real Connectors – JDBC, ODBC, SAS Libname

Real Transactions – Rollback and Commit

MultiUser Server

Copyright © 2004, SAS Institute Inc. All rights reserved.

What’s New in SAS Grid Automation

Cheryl DoningerR&D Director, Grid Development

Roger ThompsonRelationship Manager

Merry RabbProduct Manager, Grid

Copyright © 2004, SAS Institute Inc. All rights reserved.

Grid Computing Market Size & Growth

Rapid Adoption of Grid Computing Based on Benefits

Copyright © 2004, SAS Institute Inc. All rights reserved.

Grid Adoption is Increasing

A high percentage of firms using analytical

applications are considering grid

2/3 of firms surveyed are using or

considering grid technology

Copyright © 2004, SAS Institute Inc. All rights reserved.

Benefits of Grid Computing

Faster results

More executions – more data

Time to recover from errors

Better use of resources

Virtualize resources

Incremental IT spend

Copyright © 2004, SAS Institute Inc. All rights reserved.

Types of Applications Suitable for Grid Long running

Many replicate runs of same fundamental task• simulation (what if analysis)• optimization (testing lots of scenarios)• BY GROUP processing• data segmentation

Independent tasks running against large data sources• scoring – risk analysis• multiple procedures and data steps

Copyright © 2004, SAS Institute Inc. All rights reserved.

SAS Grid Strategy

Infrastructure benefits SAS applications• large data / complex algorithms

Focus areas• Development

• Run-time

• System management

Incremental Releases

Copyright © 2004, SAS Institute Inc. All rights reserved.

SAS Grid Roadmap Phase I

SAS 8.2 functionality• %Distribute

• SAS/CONNECT

• SAS log

Copyright © 2004, SAS Institute Inc. All rights reserved.

SAS Grid Success Stories

Texas Tech University

Statistics Canada

Large Pharmaceutical Company

Copyright © 2004, SAS Institute Inc. All rights reserved.

SAS Grid Roadmap Phase II

SAS 9.1.3 Q3/2005 functionality• smarter engines for SAS IDEs

• SAS/Platform integration

• SASMC monitoring

Copyright © 2004, SAS Institute Inc. All rights reserved.

Business Analytics - Enterprise Miner on SMP

Copyright © 2004, SAS Institute Inc. All rights reserved.

Business Analytics - Enterprise Miner on Grid

Copyright © 2004, SAS Institute Inc. All rights reserved.

Data Integration – ETL Studio on SMP/Grid

Copyright © 2004, SAS Institute Inc. All rights reserved.

Data Integration – ETL Studio on SMP/Grid

Copyright © 2004, SAS Institute Inc. All rights reserved.

SAS Stored Process

Business Intelligence – Enabled on SMP/Grid

SAS Program

ETL Studio

Enterprise Miner

Web Services

Copyright © 2004, SAS Institute Inc. All rights reserved.

Grid Manager Plugin – job view

Copyright © 2004, SAS Institute Inc. All rights reserved.

Grid Manager Plugin – host view

Copyright © 2004, SAS Institute Inc. All rights reserved.

SAS 9 Grid Computing Components

SAS Applications

Piping

Distribution

Session Spawning

Grid Enabled Code Generation

NEW September 2005 Multi-Processor SAS

Multiple Components Working Together to Provide Grid Computing

SAS 9 Grid Computing

Grid Manager Plug-in

Platform Suite for SAS

Grid Monitoring

Grid Management

Job Termination

Dynamic Load Balancing

Job, Queue & Host Management

Enterprise Miner Stored Processes Data Integration

SAS Connect

Copyright © 2004, SAS Institute Inc. All rights reserved.

General Layout of a SAS Grid

Client Machine

Metadata Server

Grid Control Machine

Grid Node

Grid Node

Grid Node

nSAS Grid

Machine Grid Mgr plugin

Platform Suite for SAS

LSF

LSF

LSF

SAS ETLSAS EMSAS Foundation

Copyright © 2004, SAS Institute Inc. All rights reserved.

Grid Work Flow…

n

Node1

Node2

Node3

Node1 ! ! 1 () (SASMain)

Node2 ! ! 1 () ()

Node3 ! ! 1 () (SASMain)

LSF Cluster File

SASMain – Server Context

Platform Server Component

sas -noobjectserver

SASServers

Metadata Server

Workspace Server

Connect Client

LSF

SAS MC

SAS Metadata

session resource sascmd wl options------------------------------------------------- p1 SASMain sas –noobjectserver

grdsvc_enable(p1, “resource=SASMain”);

ETL Studio

Enterprise Miner

signon p1;

Copyright © 2004, SAS Institute Inc. All rights reserved.

Partitioning the Grid…

n

EM grid

ETL grid

Node1

Node2

Node3

Node1 ! ! 1 () (SASMain,EM)

Node2 ! ! 1 () (SASMain,EM,ETL)

Node3 ! ! 1 () (SASMain, ETL)

LSF Cluster File

Metadata Server

Workspace Server

Connect Client

LSF

SAS MC

SASServers

SASMain – Server Context

Platform Server Component

sas –noobjectserver

EM, ETL

SAS Metadata

ETL Studio

Enterprise Miner

session resource sascmd wl options-------------------------------------------------------------------------- p1 SASMain sas –noobjectserver ETL

grdsvc_enable(p1, “resource=SASMain, workload=ETL”);signon p1;

Copyright © 2004, SAS Institute Inc. All rights reserved.

Grid Provides: Speed and Efficiency

Copyright © 2004, SAS Institute Inc. All rights reserved.

Analytics are working, so people…

Build more models• For successively refined segments of customers

Use more data in those models

Integrate the results into operational systems• <near real time>

A SAS9.2 datastep movie

Copyright © 2004, SAS Institute Inc. All rights reserved.

Implications

More Multi thread enablement within SAS

Yes, even the DATA STEP

Saved Programs

Multi Threaded Server Capabilities• Same model, parallel data for thruput

• Many models, same data – one off scores in operational systems

Models Management can deploy models to “score servers” without restarting them

Copyright © 2004, SAS Institute Inc. All rights reserved.

Bits and Pieces

Reverse Engineer SAS jobs

Checkpoint and Restart SAS jobs

Encode (and protect) your SAS jobs

ZIP functions

CRC …

Copyright © 2004, SAS Institute Inc. All rights reserved.

Copyright © 2004, SAS Institute Inc. All rights reserved.

Protect your IP

PROC SCRAMBLE

file=‘myfile.sas’

outfile=‘secret.sas’ <expire=> <site=> …

;

Send secret.sas to your customers

%include ‘secret.sas’; • Implies nosource; your macros can reset NOMPRINT…

Copyright © 2004, SAS Institute Inc. All rights reserved.

Checkpoint/Restart andParallelization Featuresin the Core SupervisorRick Langston, Core Systems Department

Copyright © 2004, SAS Institute Inc. All rights reserved.

Checkpoint/Restart

Craig R.’s request as per user community

Job fails – want to restart where it left off

ETL Studio also wanted a restart facility

Copyright © 2004, SAS Institute Inc. All rights reserved.

A simple solution

Record a checkpoint number, save it in WORK

If restarting, skip PROC / DATA steps to there

Tokenize everything

Execute all global statements

Copyright © 2004, SAS Institute Inc. All rights reserved.

To set up for checkpointing

Use NOWORKINIT, NOWORKTERM

Have WORK refer to a permanent directory

Use the CHECKPOINT option

Copyright © 2004, SAS Institute Inc. All rights reserved.

Subsequent restarting

Again use NOWORKINIT, NOWORKTERM

Again use WORK to the permanent directory

Use the RESTART option

Job will restart as of the last successful step

Copyright © 2004, SAS Institute Inc. All rights reserved.

Is this what users want?

We can’t do this without user being proactive

data temp / set temp issues

skipped steps may need to be executed

Output files (flat files – DISP=MOD, databases…)

Copyright © 2004, SAS Institute Inc. All rights reserved.

EXECUTE_ALWAYS

CHECKPOINT / EXECUTE_ALWAYS;

Use it for a step that must be executed

For example, SYMPUT and CALL EXECUTE

Copyright © 2004, SAS Institute Inc. All rights reserved.

Copyright © 2004, SAS Institute Inc. All rights reserved.

Example

Using options debug=‘checkpoint-implicit’;

Option names still to be decided

Copyright © 2004, SAS Institute Inc. All rights reserved.

Copyright © 2004, SAS Institute Inc. All rights reserved.

Copyright © 2004, SAS Institute Inc. All rights reserved.

data temp1; x=1; run;

data temp2; x=2; run;

data temp3; x=3; run;

data _null_;

if "&sysparm."="1"

then abort abend 999;

run;

data temp4; x=4; run;

Copyright © 2004, SAS Institute Inc. All rights reserved.

Invoke once with checkpoint-implicit

Then reinvoke with restart-implicit

Copyright © 2004, SAS Institute Inc. All rights reserved.

Copyright © 2004, SAS Institute Inc. All rights reserved.

Additional info

Planned for 9.2

Option names still being decided

Wanting additional input

Copyright © 2004, SAS Institute Inc. All rights reserved.

Parallelization Efforts

Reading in arbitrary SAS code

Producing metadata in comments

This could be post-processed by ETL Studio

This could be post-processed by Grid Computing

Copyright © 2004, SAS Institute Inc. All rights reserved.

Parallelization Efforts

Researching so far

Hooks in dependency opens

Catalogs, flat files, SAS data sets, etc.

Emitting info in comments

Example of use

Copyright © 2004, SAS Institute Inc. All rights reserved.

Copyright © 2004, SAS Institute Inc. All rights reserved.

Copyright © 2004, SAS Institute Inc. All rights reserved.

Copyright © 2004, SAS Institute Inc. All rights reserved.

Copyright © 2004, SAS Institute Inc. All rights reserved.

Copyright © 2004, SAS Institute Inc. All rights reserved.

Exposure to User

New option, such as DEPMETA=fileref

SAS program with comments written to this file

Copyright © 2004, SAS Institute Inc. All rights reserved.

Questions/comments?

Copyright © 2004, SAS Institute Inc. All rights reserved.

Ideas for the Future!

How can the software learn?

So the user doesn’t have to learn about the software; they can learn the business!

Some future ETL studio JOB• Remembers data volumes from last weeks run

• Uses that memory to choose a better strategy

Copyright © 2004, SAS Institute Inc. All rights reserved.

Your Turn!!

You tell me next time SAS forgets something it should have remembered

And why remembering that would help SAS improve next time

< Paul.Kent@sas.com >

Thanks for listening!