A Beginner's Guide to Get Started With SAP Predictive ...

42
1 A Beginner’s Guide to Get Started with SAP Predictive Analytics on SAP HANA Applies to: SAP Predictive Analytics 2.4 and above SAP HANA SPS10 (rev 102.2) and above Automated Predictive Library (APL) 2.4.10 Summary This whitepaper describes the technical implementation steps of SAP Predictive Analytics for SAP HANA and elaborates on how you can leverage native SAP HANA capabilities from SAP Predictive Analytics. Company: SAP Labs LLC Created on: February 2016 Author Bio Debraj Roy is a Senior BI Product Manager within the SAP Analytics Product Management Team. He is working as a Product Manager for SAP Predictive Analytics.He is a SAP Certified SAP HANA specialist .

Transcript of A Beginner's Guide to Get Started With SAP Predictive ...

Page 1: A Beginner's Guide to Get Started With SAP Predictive ...

1

A Beginner’s Guide to Get Started with SAP Predictive Analytics on SAP HANA Applies to:

SAP Predictive Analytics 2.4 and above SAP HANA SPS10 (rev 102.2) and above Automated Predictive Library (APL) 2.4.10

Summary

This whitepaper describes the technical implementation steps of SAP Predictive Analytics for SAP HANA and elaborates on how you can leverage native SAP HANA capabilities from SAP Predictive Analytics.

Company: SAP Labs LLC Created on: February 2016

Author Bio

Debraj Roy is a Senior BI Product Manager within the SAP Analytics Product Management Team. He is working

as a Product Manager for SAP Predictive Analytics.He is a SAP Certified SAP HANA specialist .

Page 2: A Beginner's Guide to Get Started With SAP Predictive ...

2

Table of Contents

1. INTRODUCTION ................................................................................................................................ 3

1.1 Objective ........................................................................................................................................ 3

2. BUSINESS BENEFITS EXPECTED FROM SAP PREDICTIVE ANALYTICS FOR SAP HANA ........................... 4

2.1 Automating predictive modeling process in SAP HANA ..................................................................... 4

2.2 Generate in-Database predictive scoring .......................................................................................... 5

2.3 Operationalize predictive scoring ...................................................................................................... 6

2.4 Predictive model management using Model Manager ....................................................................... 6

2.5 Create analytical data sets automatically – with few clicks................................................................. 7

2.6 Experience the power of Link Analysis and advanced personalization ............................................... 8

3. ARCHITECTURE & SECURITY ............................................................................................................. 9

3.1 Client Server Architecture ................................................................................................................. 9

3.2 Desktop Application Architecture .................................................................................................... 10

4. SAP PREDICTIVE ANALYTICS DEPLOYMENT ON SAP HANA ............................................................... 10

4.1 Automated Analytics – supported SAP HANA artifacts .................................................................... 10

4.2 Automated Analytics Connectivity ...................................................................................................... 11 4.1.1 Automated Analytics work flow on SAP HANA datasets .............................................................................. 13

4.1.2 Prepare the dataset ..................................................................................................................................... 14

4.1.3 Train the data set ......................................................................................................................................... 15

4.2 Automated Predictive Library (APL) functions .................................................................................... 24

4.2.1 Checking APL installation ............................................................................................................................ 24

4.2.2 Privileges and Security................................................................................................................................. 25

4.2.3 Usage of APL functions in Automated Analytics .......................................................................................... 25

4.2.4 Usage of APL functions in SAP HANA studio (SQLScript)........................................................................... 25

4.2.5 Usage of prepackaged APL stored procedures .............................................................................................. 29

4.3 Expert Analytics ................................................................................................................................. 29

4.3.1 Expert Analytics work flow on HANA datasets ............................................................................................. 29

4.3.2 Prepare dataset ........................................................................................................................................... 30

4.3.3 Predict results .............................................................................................................................................. 30

4.3.4 Auto generated reports ................................................................................................................................ 32

4.3.5 Exporting predictive models in SAP HANA .................................................................................................. 33

4.3.6 Usage of advanced SAP HANA AFL (application function library) based business functions ....................... 33

4.4 Automated Analytics or Expert Analytics: what to choose when? ...................................................... 35

4.5 Model Manager .................................................................................................................................. 36 4.4.1 Import Models .............................................................................................................................................. 36

4.4.2 Model Management on SAP HANA ............................................................................................................. 37

4.4.3 Auto generated reports ................................................................................................................................ 38

Summary .................................................................................................................................................... 41

Related Content .......................................................................................................................................... 41

Copyright .................................................................................................................................................... 42

Page 3: A Beginner's Guide to Get Started With SAP Predictive ...

3

1. INTRODUCTION

1.1 Objective

1.2 Components of SAP Predictive Analytics for HANA

SAP Predictive Analytics for SAP HANA is the complete set of tools and components for end to end automated predictive modeling on SAP HANA which includes Automated Analytics, Expert Analytics, Model Manager and access to the SAP HANA-native APL 1(Automated Predictive library)

functions in a single desktop or client/server installation.

1.3 Feature overview of SAP Predictive Analytics for SAP HANA

You can run automated predictive analysis using Automated Analytics or Expert Analytics by choosing SAP HANA tables or SAP HANA calculation and analytical views. (Attribute views are not shown to the users, while browsing SAP HANA information views in the exploration screen, but can be configured to work with Automated Analytics. For more information on how to configure attribute views, please see SAP Note: 2200360).

With few clicks you can generate predictive models by using Automated Analytics and persist them in SAP HANA database for later use.

You can publish the predictive models by writing SAP HANA procedures including APL functions in SAP HANA studio.

You can create predictive models using APL/PAL2/R3 functions and export them as stored procedures from Expert Analytics. Those stored procedures can be reused as predictive models on new dataset.

Using Model Manager you can load the predictive model created in SAP HANA and perform various model management activities, such as retraining the model, analyzing model performance deviations, etc.

1 APL (Automated Predictive Library) is a native C++ implementation of the automated predictive capabilities of SAP Predictive Analytics running directly in SAP HANA

2 The Predictive Analysis Library (PAL) defines native SAP HANA functions that can be called from within SQLScript procedures to perform analytic algorithms.

3 R is a programming language and software environment for statistical computing and graphics supported by the R Foundation for Statistical Computing.

The purpose of this document is to make you familiar with the various components of SAP Predictive Analytics for SAP HANA database and provide you an overview of their implementation details. It is highly recommended that at first you refer SAP note 2215245 which highlights the current supportability of SAP HANA artefacts with various SAP Predictive Analytics Components.

Page 4: A Beginner's Guide to Get Started With SAP Predictive ...

4

2. BUSINESS BENEFITS EXPECTED FROM SAP PREDICTIVE ANALYTICS FOR SAP HANA

Let’s review the opportunities and return on business investments that you can expect from the implementation of SAP Predictive Analytics on SAP HANA.

2.1 Automating predictive modeling process in SAP HANA

You can create a real-time predictive modeling environment on SAP HANA for both business analysts and data scientists. Empower business analysts by providing them the automated tools they need to build sophisticated predictive models for every data mining function thinkable – in days, not weeks or months, such as Automated Analytics, APL libraries and Model Manager.

You can leverage the SAP HANA-native predictive components (“R”- scripts, PAL(Predictive Analytics Library), BFL (Business function library) -which are based on AFL4 (application foundation library framework), Unified Demand Forecasting function (UDF), SAP HANA Sentiment Analysis function, Business Optimization Function and APL (Automated Predictive Library) functions) to boost up the performance of your predictive applications.

The Model Manager can automatically triggered to check the possible deviations of the model & retrain it if performance decreases.

4 SAP HANA application function library AFL consists of application functions and a framework for grouping them together, such as the Predictive Analysis Library (PAL) and the Business Function Library (BFL) are part of the base AFL package.

Page 5: A Beginner's Guide to Get Started With SAP Predictive ...

5

2.2 Generate in-Database predictive scoring

Example of identifying hot sales leads:

Fit Score (also referred to as an explicit score)

Fit score is intended to capture how much an incoming prospect resembles a likely buyer. For example you might need to look at the company size, geographic location, industry, And job title, to determine if the lead is a fit.

Behavioral Score (also referred to as an implicit score)

Behavioral Score is intended to capture how much a prospect is engaged with your company. This could include the lead’s website visits, form completes, email clicks, and maybe even application usage data.

Identify hot leads

Combination of both scores will help you identify the hot leads. In the above case the overall score which is the blend of the fit score and behavioral score is the highest in the first case, so you can consider the sales leads falling into 1st category are the hot leads.

Improving the quality of the predictive models through in-Database scoring

You can improve the quality of your business models by embedding predictive results from in-Database predictive scoring generated through SAP HANA native APL, PAL and R functions. Using Automated Analytics you can view each variable’s contribution to a business model, simulate model run or apply the model for input data set, predict the score for concerns in real-time and persist the model and score results within the database tables or views.

You can also export the model as a stored procedure in SAP HANA using the Expert Analytics.

Page 6: A Beginner's Guide to Get Started With SAP Predictive ...

6

2.3 Operationalize predictive scoring

You can operationalize predictive scoring by saving the predictive models in SAP HANA database tables or supported information views 5 or as stored procedures. You can also embed them into Business Intelligence (BI) workflows, or in any other application directly (using JAVA & C++) to operationalize the results.

2.4 Predictive model management using Model Manager

Model Manager (a thin web client within the SAP Predictive Analytics solution) supports the ability to schedule model refresh tasks, address a real-time need for scoring, and manage potentially a large number of predictive models across the entire enterprise.

5 SAP HANA information views: There are three types of information views available in SAP HANA: attribute view, analytic view and calculation view. All of these three type information views are non-materialized views.

Page 7: A Beginner's Guide to Get Started With SAP Predictive ...

7

2.5 Create analytical data sets automatically – with few clicks

Expert Analytics

You can prepare data without any need of coding. Using the prepare room within Expert Analytics, you can define a broad set of reusable business rules automatically to create analytical data sets, which you can use for business modeling. With this approach, you can analyze data faster and achieve results with far less human error than traditional, handcrafted techniques.

Automated Analytics

In Automated Analytics, the Data Manager option lets you perform data manipulation with the information stored in SAP HANA tables. Additionally you can create analytical datasets by joining multiple relational SAP HANA table entities and save them back to the SAP HANA system so that they become new, enriched and dynamic data sources. The analytical datasets can be reused/repopulated to train model with respect to time.

Page 8: A Beginner's Guide to Get Started With SAP Predictive ...

8

2.6 Experience the power of Link Analysis and advanced personalization

A social network is a structure represented in the form of a graph, composed of nodes and links.

Social network structure

You can explore the links between your customers and their network of strong social influencers within Automated Analytics by building social networks and perform link analysis from the information stored in SAP HANA information views (Calculation, Analytical & Configured Attribute views6), SAP HANA tables or other big data sources such as Hadoop. You can create a graph for every possible type of product or entity association, and use the dataset to achieve advanced personalization that helps improve customer loyalty.

6 *Attribute views are not shown when browsing SAP HANA information views but can be configured to work with automated analytics. For more information, see SAP Note: 2200360.

Page 9: A Beginner's Guide to Get Started With SAP Predictive ...

9

3. ARCHITECTURE & SECURITY Below are the two different modes of deployment that are possible with SAP Predictive Analytics on SAP HANA.

3.1 Client Server Architecture

The production deployment of SAP Predictive Analytics typically has a three-tier client-server Architecture. With this configuration, you can install the following server-based components:

Java Web Start, a server-based client software deployment tool.

Model Manager for automating modeling tasks.

Automated Analytics client application communicates with the server module and the data source is usually any supported database such as SAP HANA or a file system. For each client connection, a new Automated Analytics instance process is started on the server. Depending on the server configuration, the process can be started with a specific system account, or with the user account. Communication between the clients and server is encrypted using SSL For the Expert Analytics you still need the SAP Predictive Analytics desktop version as SAP Predictive Analytics client doesn’t support Expert Analytics.

Page 10: A Beginner's Guide to Get Started With SAP Predictive ...

10

3.2 Desktop Application Architecture

You can install the SAP Predictive Analytics desktop on your local Windows machine and connect to SAP HANA using Automated Analytics or Expert Analytics user interface.

Automated Analytics supports APL delegation for most of the predictive model creation. Please refer section 4.2.3 Usage of APL functions in Automated for more information.

As of SAP Predictive Analytics 2.4 version for SAP Predictive Analytics desktop and client multi-tenancy deployment of SAP HANA both for Expert & Automated Analytics is not supported.

4. SAP PREDICTIVE ANALYTICS DEPLOYMENT ON SAP HANA

In this section you will get acquainted with the techniques for performing predictive analysis in SAP HANA using Automated Analytics, Expert Analytics, Model Manager and APL (Automated Predictive Library) functions.

4.1 Automated Analytics – supported SAP HANA artifacts

List of SAP HANA objects which can be used in Automated Analytics

You can train and apply a model against SAP HANA calculated, analytic views and tables from:

SAP HANA (including Suite 4 HANA Live views) SAP HANA custom applications (XS application views) SAP Business Warehouse (BW) on HANA

Only analytic and calculation views can be selected as data sources. Attribute views are not shown when browsing SAP HANA information views. But it can be configured to make it work within automated analytics .For more information, how to configure see SAP Note: 2200360.

Analytic and calculation views that use the variable mapping feature are not supported. The variable mapping feature is available starting with SAP HANA SPS 09.

Page 11: A Beginner's Guide to Get Started With SAP Predictive ...

11

4.2 Automated Analytics Connectivity

Automated Analytics connects to the SAP HANA calculation, analytical and configured attribute views7 via The SAP HANA ODBC driver. You need to create a local ODBC connection on your machine (or in the server where SAP Predictive Analytics server is installed).

Once you are authenticated by the SAP HANA server based on you user authorization profile you should be able to pick and choose the SAP HANA calculation, analytical or configured attribute views from content folder or SAP HANA tables from catalog folder for your analysis.

7 Configured attribute views:Please refer section 4.1 Automated Analytics-supported SAP HANA artifacts for more information.

STEP1: Define an ODBC source that points to the SAP HANA

server

STEP2: Select the ODBC source corresponding to SAP HANA server

STEP3: Enter the SAP HANA database user name and

password

STEP4: Click Connect

Page 12: A Beginner's Guide to Get Started With SAP Predictive ...

12

Using Automated Analytics, the following predictive models can be generated:

Classification/Regression Model

Clustering Model

Time Series Model

Association Rules

Social Network Analysis

Colocation Analysis

Frequent Path Analysis

New Recommender Model

Input parameters and variable support for HANA information views.

Automated Analytics supports the input parameters and variables defined in the SAP HANA calculation analytical and configured attribute views (See SAP note 2200360). The variables or input parameters values can be chosen by the user, thus a subset of the result set from SAP HANA views can be selected using Automated Analytics component.

STEP5: In Automated Analytics, the SAP HANA

views are found under the node “Content” and

Tables are found under

“Catalog”

Page 13: A Beginner's Guide to Get Started With SAP Predictive ...

13

Map Input Parameters or Variables of External Views for Value Help

Mapping parameters of the current view to the parameters of the underlying data sources, moves the filters down to the underlying data sources during runtime, which reduces the amount of data transferred across them. For value-helps from external views, in addition to the parameters, you could also map variables from current view to the external views. Analytic, calculation & configured attribute views (See SAP note 2200360) that use the variable mapping feature are supported in automated analytics starting

With SAP HANA SPS 09.

4.1.1 Automated Analytics work flow on SAP HANA datasets

The below flow diagram shows the different phases of the predictive modeling process on SAP HANA using Automated Analytics. To illustrate, we will assume that the predictive model falls into either one of the categories – Classification or Regression.

Page 14: A Beginner's Guide to Get Started With SAP Predictive ...

14

4.1.2 Prepare the dataset

In the Automated Analytics Data Manager section you can perform data manipulation based on the data set stored in the SAP HANA tables.

Data Manipulation Editor:

Using the Data Manipulation Editor, you can:

Define a calculated field, new aggregate, condition, or lookup table. Create a new normalization on a specific field or define a new SQL expression. Merge multiple tables using the merge functionality. Set filters and prompts on the SAP HANA tables.

Data manipulation empowers users to prepare the training dataset and format it with ease. Data Manager:

In Data Manager, you can create analytical datasets by relating multiple SAP HANA tables and persist them in the database for predictive analysis. When you create the Analytical Data Set each time you use it in the model training, these datasets are dynamically populated based on the reference date. It’s not a one-shot extraction.

Page 15: A Beginner's Guide to Get Started With SAP Predictive ...

15

4.1.3 Train the data set

You can train a predictive model based on sample data stored in any of the SAP HANA tables or views by selecting it from the drop down list.

While training your model you need to choose a cutting strategy

A cutting strategy is a technique that allows decomposition of a training dataset into three distinct sub-sets:

1. An estimation sub-set 2. A validation sub-set 3. A test sub-set

Automated Analytics requires two mandatory (estimation & validation) and one optional (test) sub-data sets for predictive modeling.

With the exception of the customized cutting strategy where you define your own data sub-sets in 3 separate files, cutting strategies are automatic in Automated Analytics. Automatic cutting strategies operate upon a single data file, which constitutes your initial dataset.

To generate predictive models, there are nine cutting strategies that you may use as below:

Page 16: A Beginner's Guide to Get Started With SAP Predictive ...

16

4.1.3.1 Model Overview Summary

After each run Automated Analytics will generate summary reports indicating the Predictive Power and Predictive Confidence of the analysis.

Predictive Power (also known as KI): This is the quality indicator of the models generated by

Automated Analytics. It corresponds to the proportion of information contained in the target variable that the explanatory variables are able to explain. Prediction Confidence (also known as KR): This the robustness indicator of the models

generated by Automated Analytics. It indicates the capacity of the model to achieve the same performance when it is applied to a new dataset exhibiting the same characteristics as the training dataset.

4.1.3.2 Analyze the result set

In this document to understand the result set we will assume that the generated predictive model is either one of the categories – Classification or Regression.

You can now start looking at the automatically generated reports to understand whether the Predictive model generated in SAP HANA is robust enough to be used in real business scenarios. You can analyze the generated predictive model using:

Model Overview Contributions by Variables Statistical Reports Confusion Matrix Model Graphs Category Significance Scorecard Tiles

Page 17: A Beginner's Guide to Get Started With SAP Predictive ...

17

In this white paper we will discuss only Model Graphs, Contributions by Variables and Confusion Matrix reports.

Model Graphs

You can have a look at the model performances using the “Model Graphs “.

Depending on the type of the target, the model graph plot allows you to:

View the realizable profit that pertains to your business issue using the model

generated when the target is nominal. Compare the performance of the model generated with that of a random type model

and that of a hypothetical perfect model when the target is nominal. Compare the predicted value to the actual value when the target is continuous.

On the plot, for each type of model, the curves represent:

When the target is nominal, the realizable profit (on the Y axis) as a function of the ratio of the observations correctly selected as targets relative to the entire initial dataset (on the X axis).

When the target is continuous, the predicted value or score (on the X axis) in respect with the actual value or target (on the Y axis).

Page 18: A Beginner's Guide to Get Started With SAP Predictive ...

18

The Model Graph shows 3 curves:

The default parameters display the profit curves corresponding to the Validation sub-set (blue line), the hypothetical perfect model (Wizard, green line) and a random model (Random, red line). The default setting for the type of profit parameter is the Detected Profit, and the values of the horizontal axis are provided in the form of a percentage of the entire dataset.

Random (in red): represents the profit that may be achieved using a random model that does not

allow one to know even a single value of the target variable for each observation of the dataset, like if you were flipping a coin for every case, to get all true cases you need to flip it for everyone. Wizard (in green) represents the profit that may be achieved using the hypothetical perfect

model created by Automated Analytics that allows one to know with absolute confidence the value of the target variable for each observation of the dataset. Validation (in blue): The profit that may be achieved using the model generated by Modeler (on

both SAP HANA or supported non-HANA systems) that allows one to perform the best possible prediction of the value of the target variable for each observation of the dataset.

Therefore closer the blue curve is to the green, the more accurate the model is.

Contributions by Variables

You can evaluate the Contributions by Variables chart to set values and assess the impact.

You also can double-click on the blue bar for more details on the each variable contribution.

Top 5 Variable Contributions

on the Predictive Model

Page 19: A Beginner's Guide to Get Started With SAP Predictive ...

19

Confusion Matrix

The Confusion Matrix report allows you to visualize the target values predicted by the model

compared with the real values and simulate your “profit” (benefit) depending on the selected threshold score or to automatically adapt the threshold to obtain a maximum profit (perform what-if analysis).

4.1.3.3 Applying/Running the model.

This section within Automated Analytics can be used to apply the model on the new dataset.

Predictive model generation options.

While applying the predictive model you can select one of the many model generation options available below within the Automated Analytics engine. Depending on the predictive model selection and the option chosen in the “Generate” pull down menu, the generated data will contain the type of the results: For example: For a Classification/Regression predictive model, the model generation options will be as below:

Type of Results Descriptions

Page 20: A Beginner's Guide to Get Started With SAP Predictive ...

20

Predicted value only Corresponds to the value predicted by the model for the target variable of each observation.

Will generate a results' file containing the following information.

The predicted value - (rr_<target variable name>)

Probability Corresponds to the probability of each observation belonging or not to the target category of the target variable.

Will generate a results' file containing the following information.

●The predicted value ● The probability (proba_rr_<target variable name>) ● The prediction range (bar_rr_<target variable name>)

Individual Contributions The individual contributions by variables contained in the dataset with respect to the target variable. The sum of all those individual contributions corresponds with the predicted value (score) to the nearest whole number.

Will generate a results' file containing the following information.

● The individual contributions of variables (contrib_VariableName_rr_<target variable name> )

Decision The "decision" option can only be used for classification models, that is, when the target variable is nominal. The decision is taken on the basis of a threshold that is applied on the scores generated by the model. The target category of the target variable is assigned to observations whose scores are superior to the threshold. The default threshold (computed during the generation “or training “of the model) is chosen so that the way the categories of the target variable are assigned to observations is representative from their distribution in the training dataset. Will generate a results' file containing the following information.

● The predicted value The decision - (decision_rr_<target variable name>) ● The decision probability - (proba_decision_rr_<target variable name>) ● The probability

Advanced Apply Settings This option allows you to select the outputs you want to see in the results file.

Use direct apply in the database

When this option is checked, the optimized scoring mode In-database Apply will be used and the data will be generated directly into the database.

Add Score Deviation This option allows you to check the deviations for each variable and each variable category between the model and the input data set used for the model application.

Page 21: A Beginner's Guide to Get Started With SAP Predictive ...

21

Save the model:

If you are satisfied with the predictive confidence score then you can save the model in SAP HANA database in a MODEL type table (preloaded table in SAP HANA installation) or execute the generated SQL source code for HANA database.

Generate Source Code:

Code generation is a component exporting predictive models in different programming languages such as Java, C, “SQL Code for SAP HANA” etc. The generated code allows applying predictive models outside of the SAP Predictive Analytics application. You can use this “SQL Code for SAP HANA” and create a procedure within SAP HANA to reproduce the predictive model.

Page 22: A Beginner's Guide to Get Started With SAP Predictive ...

22

Save the results in SAP HANA table or view.

The result set of the predictive analysis can be saved in SAP HANA as a table or view by specifying a name. You need to choose the checkbox “User Direct Apply in the Database” in order to persist the scoring results in SAP HANA database.

Loading a previously generated predictive model.

You can use a previously created model by going to “Load Model” under Modeler menu option.

Page 23: A Beginner's Guide to Get Started With SAP Predictive ...

23

Then connect to SAP HANA database and refresh the screen to load the desired predictive model.

You can select input data from any new SAP HANA table or view and retrain the model loaded into Automated Analytics Engine.

Deploy predictive model in SAP HANA (in-Database v/s save model).

Preferred option: In-Database option is the preferred one, as when you choose the checkbox “User

Direct Apply in the Database” the predictive model and scoring results gets stored in SAP HANA database. No further action is needed to deploy the predictive model and scoring results in SAP HANA database.

While generating source code, you need to generate the SQL code for SAP HANA first, then execute the standalone code in SAP HANA studio or create a stored procedure with the code snippet inserted in it to make it work.

Why would I choose one or the other: When you have only one SAP HANA database from where the

business users are running report you could choose the “in-Database” option, however if you have multiple SAP HANA systems and business users are running several applications and reports from them, you might need to choose the SQL for SAP HANA code generation option.

Page 24: A Beginner's Guide to Get Started With SAP Predictive ...

24

4.2 Automated Predictive Library (APL) functions

4.2.1 Checking APL installation

You can install APL package using the ./hdbinst command in your Linux based SAP HANA system.

The following post installation activities needs to be performed after APL library installation:

The SAP HANA script server must be enabled.

The SAP HANA index server should be restarted.

You can validate that the APL has been installed correctly by executing the following commands:

Unsupported columns in table types:

While processing data using Predictive Analysis Library (PAL) or Automated Predictive Library (APL) in Expert Analytics you could encounter errors, since the Application Function Library (AFL) (underlying Framework for PAL &APL) does not support certain columns in table types. As of SAP Predictive Analytics 2.4 the following are the supported CSTYPE (column store Type )/SQLTYPE/DIMENSION combinations from AFL:

"INT" / "INTEGER"

"FIXED8_19_0" / "BIGINT" ("FIXED8_19_0" means fixed with 8 bytes and 19 digits)

"DOUBLE" / "DOUBLE"

"STRING" / "CLOB"

"STRING" / "NCLOB"

"STRING" / "VARCHAR" / ...

"STRING" / "NVARCHAR" / ...

"DAYDATE" / "DATE"

"SECONDTIME" / "TIME"

"LONGDATE" / "TIMESTAMP"

"SECONDDATE" / "SECONDDATE"

For detailed information on the APL installation and restrictions please refer the SAP note 2215245 and APL user guide in http://help.sap.com/pa-> APL Documentation.

Check that APL functions are installed

Select * from "SYS"."AFL_AREAS"; select * from "SYS"."AFL_PACKAGES";

select * from "SYS"."AFL_FUNCTIONS" where AREA_NAME='APL_AREA';

select "F"."SCHEMA_NAME", "A"."AREA_NAME", "F"."FUNCTION_NAME", "F"."NO_INPUT_PARAMS", "F"."NO_OUTPUT_PARAMS", "F"."FUNCTION_TYPE", "F"."BUSINESS_CATEGORY_NAME" from "SYS"."AFL_FUNCTIONS_" F,"SYS"."AFL_AREAS" A where "A"."AREA_NAME"='APL_AREA' and "A"."AREA_OID" = "F"."AREA_OID"; select * from "SYS"."AFL_FUNCTION_PARAMETERS" where AREA_NAME='APL_AREA';

The SAP HANA APL (Automated Predictive Library) is a native library based on the AFL (Application Function Library) framework which lets you use the data-mining capabilities of the SAP Predictive Analytics engine on data stored in SAP HANA.

Page 25: A Beginner's Guide to Get Started With SAP Predictive ...

25

4.2.2 Privileges and Security

The APL inherits its security and privilege requirements from the SAP HANA AFL (Application Function Libraries).

When the APL is installed, two new roles are available:

● AFL__SYS_AFL_APL_AREA_EXECUTE ● AFL__SYS_AFL_APL_AREA_EXECUTE_WITH_GRANT_OPTION

Every SAP HANA user that needs to run APL functions requires one of the above roles to be granted. Additionally in order to be able to generate the AFL wrappers to create APL function wrappers during predictive modeling process, the user must have the AFLPM_CREATOR_ERASER_EXECUTE role. For detailed information on the user privileges please refer the APL user guide in http://help.sap.com/pa-> APL Documentation.

4.2.3 Usage of APL functions in Automated Analytics

Automated Analytics can use the APL for predictive models when the data is on SAP HANA. For models using SAP HANA as a data source, model training computations are now delegated to the Automated Predictive Library in SAP HANA when possible except if the model corresponds to one of the case below:

• In the Recommendation and Social Analysis modules. • When the model uses a custom cutting strategy. • When the model uses the option to compute a decision tree.

For more information, see the SAP HANA Automated Predictive Library Reference Guide and the Automated Analytics Preferences Setup Guide on the SAP Help Portal at http://help.sap.com/pa.

4.2.4 Usage of APL functions in SAP HANA studio (SQLScript)

The APL can also be accessed directly using SQLScript from SAP HANA studio.

Page 26: A Beginner's Guide to Get Started With SAP Predictive ...

26

4.2.4.1 Workflow of calling APL functions

To call an APL function, you need to execute SQLScript statements. The sequence of statements

should do the following:

To execute an APL function, you need to insert it into an AFL function call (known as a wrapper).

4.2.4.2 Steps of building predictive models using APL

4.2.4.2.1 Generate an analysis model

The first step of the analysis is to prepare input tables in SAP HANA containing historical information. The analysis model includes descriptions of the data and its relationships from the historical dataset.

Creating the Model.

To create an analysis model, you can use the CREATE_MODEL function in SQLScript within HANA studio and describe the model using following three input tables:

Function header (FUNC_HEADER)

Operation configuration (OPERATION_CONFIG)

Training dataset (DATASET)

The CREATE_MODEL_AND_TRAIN function can be used for the same purpose to generate analysis model where you can have more flexibility to provide the function with two additional input tables: VARIABLE_ROLES and VARIABLE_DESCS. These tables describe the relationships between the data in the training dataset.

4.2.4.2.2 Train the model on an input dataset

Analysts can use the input tables to train the APL model and return summary information within the output tables as well as performance indicators like the Predictive Power (KI) of the model and the Prediction Confidence (KR). The input dataset, also known as the training dataset, has historical information to train the model. You can train the predictive model against this dataset, you can retrain the model if you have made changes to the variables, or you can use another training dataset in order to create a more powerful model. You can use the TRAIN_MODEL function in HANA studio SQLScript and provide input as below:

Page 27: A Beginner's Guide to Get Started With SAP Predictive ...

27

INPUT TABLES

FUNC_HEADER

Optional function header table

DATASET Table containing the training dataset, including the known target variables

OPERATION_CONFIG Information (type of analysis, cutting strategies to use)

VARIABLE_ROLES Describes the relationships between the data, and specifying which column is the target variables. If you do not declare the target variable, the LAST column is considered to be the target variable.

MODEL The function returns the output tables mentioned in OUTPUT table section below

OUTPUT TABLES

Descriptions of the output tables returned by the TRAIN_MODEL APL function.

Updated MODEL table Takes into account the relationships between the data and the target variable in your training dataset.

INDICATORS Contains the performance indicators indicating, among other things the predictive power (this is the quality indicator of the model), and the predictive confidence (this is the robustness indicator).

SUMMARY Contains performance indicators provided by APL, and an overview of the training operation provided by the Automated Analytics engine

LOG Contains any status/warning/error messages that are returned from the APL function.

Page 28: A Beginner's Guide to Get Started With SAP Predictive ...

28

Getting the Output Table Type.

You can use the GET_TABLE_TYPE_FOR_APPLY function to get the description of the output table that you should use for the APPLY_MODEL function. Provide the following as input to this function:

● FUNC_HEADER ● OPERATION_CONFIG ● MODEL ● DATASET

The function returns the expected table type definition of the output dataset for an apply operation.

Define Cutting Strategies.

The cutting strategy defines how a training dataset is cut under three subsets:

1. Estimation 2. Validation 3. Test sets

Depending on the model (model type, number of targets, etc.) not all cutting strategies can be used and need to be passed as parameters in the OPERATION_CONFIG table. Please refer to the APL user guide (http://help.sap.com/pa -> APL Documentation) for more information on how to use it while training the dataset.

4.2.4.2.3 Apply the trained model to your application dataset

The “Apply Model” APL function produces predictive scores. Through SQLScript you can use the APPLY_MODEL function to get the target variables for your application dataset. Please refer to the APL user guide (http://help.sap.com/pa -> APL Documentation) for more information on how to use this APL function in more details.

4.2.4.2.4 Publishing predictive models in HANA using the PUBLISH_MODEL APL function

Using the PUBLISH_MODEL APL function through SQL script you can publish APL based models into Automated Analytics compatible tables, so that the Predictive Analytics user can load the model. Once you have published a model in SAP HANA by changing the ODBC data source, you can retrain the model using a new HANA table or view from the same or a different SAP HANA system.

Page 29: A Beginner's Guide to Get Started With SAP Predictive ...

29

Steps to publish models in SAP HANA using PUBLISH_MODEL APL are:

Please refer to the APL user guide (http://help.sap.com/pa -> APL Documentation) for more

information on how to use PUBLISH_MODEL APL function.

4.2.5 Usage of prepackaged APL stored procedures

The Automated Predictive Library (APL) version 2.4 comes with SQLScript stored procedures that

take care of signature tables, table types and wrappers. As a SAP HANA developer you will now be able run APL functions with far less code.

4.3 Expert Analytics

Desktop users can access the APL functions using the “Expert Analytics” interface. It offers a graphical workbench to an expert user who wants to implement specific statistical algorithms and workflows.

Using Expert Analytics you can:

Perform statistical analysis on datasets to understand trends and detect outliers in the business.

Build models and apply them to scenarios and forecast potential future outcomes.

Access almost any data source using JDBC.

Analyze huge data volumes with SAP HANA’s in-memory processing.

When connected to SAP HANA on-line mode, Expert Analytics pushes training operations down to the database for APL and PAL nodes in the data-mining stream. So there is no data movement between the presentation and data base layer.

Within Expert Analytics, APL algorithms are easily identifiable by their “HANA Auto-“prefix. You can sequence algorithms of different types together – for example, you could start with APL based Auto Classification Analysis and then chain it together with multiple “R” and/or PAL algorithms on each of the individual clusters.

4.3.1 Expert Analytics work flow on HANA datasets

Expert Analytics has integrated rooms for data acquisition, data manipulation and interactive data visualization capabilities. With the Prepare, Predict, Visualize, and Compose rooms, you can run SAP

HANA-native predictive functions, store models as stored procedures, and create results as views to build attractive visualization, with just a few clicks.

Page 30: A Beginner's Guide to Get Started With SAP Predictive ...

30

4.3.2 Prepare dataset

To run Expert Analytics’ on SAP HANA you should connect to SAP HANA in online mode using a JDBC connection (Connect to SAP HANA option).

You can create a data set based on a SAP HANA table/view. When connected in online mode using the “Connect to SAP HANA” option, you can perform formatting actions (for example: filtering/renaming/removing/sorting/hide column, create a measure) on the data by selecting the column

and clicking on the icon on the top right.

4.3.3 Predict results

You should be able to identify APL functions prefixed by the word ‘HANA’ at the top right section of the Expert Analytics user interface-, under the “Algorithms” section. Similarly R, BFL, PAL, UDF, SAP HANA sentiment Analysis functions will show up in the same area.

In order to use the APL functions you need to install APL package in SAP HANA system.

Note: You can only work with the APL/PAL/R/UDF/Sentiment Analysis functions in Expert Analytics when connected to a SAP HANA server in online mode. For using R scripts you need to set up an R server instance outside SAP HANA.

Page 31: A Beginner's Guide to Get Started With SAP Predictive ...

31

APL Algorithms supported by Expert Analytics are:

HANA-Auto Classification

HANA-Auto Clustering

HANA-Auto-Regression

You can easily drag and drop an APL function within the Predict room and it will be automatically mapped to the input data set. Then you can configure the predictive function by clicking on it and selecting "Configure Settings" option.

Depending on the function type, you need to set the input variables, target variables and some additional parameter settings.

Page 32: A Beginner's Guide to Get Started With SAP Predictive ...

32

The system will automatically validate the configuration and you will notice a green check mark on the predictive function if the configuration is correct.

4.3.4 Auto generated reports

Each predictive analysis run in Expert Analytics generates three reports automatically: Data Grid, Summary & Model Representation.

You can then run the analysis by clicking on the run

button.

After successful execution of the analysis, the results

will show up in the results tab.

Reports generated through auto-classification APL predictive model run

Page 33: A Beginner's Guide to Get Started With SAP Predictive ...

33

4.3.5 Exporting predictive models in SAP HANA

The model can be saved as a stored procedure in SAP HANA by clicking on the function and selecting export from the right-click menu.

You need to choose the schema, specify the name of the stored procedure, and the SAP HANA view where the result set will be stored.

Every time the stored procedure is applied to a new dataset the predictive scoring will refresh the view.

4.3.6 Usage of advanced SAP HANA AFL (application function library) based business functions

You can drag and drop below three new SAP HANA business functions in the Expert Analytics Predict room for your analysis.

1. SAP HANA Demand Forecasting Component

Prerequisites:

To use the SAP HANA Demand Forecasting component, first install the plugin for the Unified Demand Forecast application function library (UDF AFL). For more details, please see SAP Note 2050229.

Users of the SAP HANA Demand Forecasting component require the following role assigned to their user name: AFL__SYS_AFL_UDFCORE_AREA_EXECUTE.

Page 34: A Beginner's Guide to Get Started With SAP Predictive ...

34

The SAP HANA Demand Forecasting component runs an algorithm on SAP HANA to produce sales

predictions for a set period in the future. A primary focus of the component is to forecast process like merchandise consumer demand as well as providing forecast interval information.

2. SAP HANA Optimization Function Component

Prerequisites:

You must have OFL (Optimization Function Library) execution privileges role -AFL__SYS_AFL_OFL_AREA_EXECUTE to see the function in the Components list.

As an expert user, you can create an SAP HANA Optimization Function. The function can be thought of

as a powerful put aside calculator that enables you to solve complex optimization functions. You create an objective function with linear constraints to calculate how best to optimize an aspect of your business, such as how to maximize profits on a product. You can also save the optimization for future use.

Sales

Forecast

Define Optimization

Function

Page 35: A Beginner's Guide to Get Started With SAP Predictive ...

35

3. SAP HANA Sentiment Analysis Component

Prerequisites:

Server: HANA system (SPS 9+) with PAL, APL and R configured.

Client: SAP Predictive Analytics 2.4 installed and R configured.

The SAP HANA Sentiment Analysis component enables you to analyze a complex stream of text (for

example, the opinions of Twitter users about a product or service). The component analyzes the opinion contained in each unit of text and relays whether the sentiment is positive or negative. This transforms unstructured data into a series of easily understandable categories to discover influencing factors. From there, you can generate insights to better run your business.

4.4 Automated Analytics or Expert Analytics: what to choose when?

For a business analyst with little data science knowledge can use Automated Analytics and SAP HANA to generate predictive model with few clicks. He / She can push down the computations in SAP HANA using APL delegation and in-database scoring results.

For an advanced data analyst or data scientist Expert Analytics may be more suitable if they need specific, tailored R or SAP HANA algorithms to answer their business needs.

Define custom

sentiments

Page 36: A Beginner's Guide to Get Started With SAP Predictive ...

36

4.5 Model Manager

In an SAP Predictive Analytics client-server environment, a Model Manager server is usually also deployed. Model Manager is a thin-client, Web server-based application that allows you to manage all of your models from a central location and automate modeling activities.

4.4.1 Import Models

You can manage models by importing a model from an SAP HANA system.

By importing the predictive models from SAP HANA, you can perform the following tasks within Model Manager:

Display all the available information of a model.

Change the settings related to a model.

Change the version of the model.

In Model Manager, you cannot use SAP HANA information views for retraining or applying a model.

Note: Model Manager is available only on Microsoft Windows, so if your modeling server is deployed on a Linux server, you need to set up a separate Windows sever for model management.

Page 37: A Beginner's Guide to Get Started With SAP Predictive ...

37

4.4.2 Model Management on SAP HANA

Once a model is saved in SAP HANA tables created via Automated Analytics or published in SAP HANA using PUBLISH APL function in SAP HANA studio, you can import the model within Model Manager and perform various model management tasks.

Several users can work on the same modeling project and scheduling the following types of tasks:

Retrain a model

Apply a model to a new dataset

Detect model deviations

Detect deviation of a dataset

Note: Predictive models generated and published in SAP HANA system through APL functions can be loaded into Model Manager to perform model management activities. The below diagram shows this scenario.

Page 38: A Beginner's Guide to Get Started With SAP Predictive ...

38

4.4.3 Auto generated reports

Additionally we can view the automatically generated reports once the scheduled model management tasks finish.

Executive Report.

Executive report shows detailed information about a model, it appears on click of the link ‘Executive

Report’ in the row of the model.

The “Executive Report” highlights the key performance indicators of the model, and also displays the performance deviation of the model from the best possible one.

Page 39: A Beginner's Guide to Get Started With SAP Predictive ...

39

Task Monitoring Reports

Task monitoring reports shows users detailed information on several tasks through individual set of

reports as below.

Variables Usage.

Variable Usage report describes the variable usage in the predictive analytics

Page 40: A Beginner's Guide to Get Started With SAP Predictive ...

40

Server Usage.

The Server Usage report highlights resource usage in the predictive analytics server.

Activity Monitoring.

Activity Monitoring report shows the activities performed within the predictive analytics server.

Page 41: A Beginner's Guide to Get Started With SAP Predictive ...

41

Summary

In this Beginner’s Guide to Get Started With SAP Predictive Analytics on SAP HANA, we discussed how

SAP Predictive Analytics and SAP HANA together as a solution opens new doors to not only data scientists, but also enables business users to auto-generate and embed predictive models within the SAP HANA database layer itself. Thus making it easier for any downstream users or applications to consume the results of a predictive model. This document also describes the various tools and technical deployment techniques to implement SAP Predictive Analytics on SAP HANA.

SAP Predictive Analytics can use SAP HANA views or tables as data source and the Automated Predictive Library (APL) brings SAP HANA-native predictive capabilities to customers in a simple and non-disruptive way.

The APL’s capabilities can be leveraged within a specialized tool such as Automated Analytics, something more general through SQLScript in SAP HANA studio, or via a more data scientist friendly tool such as Expert Analytics.

Many of our customers have already spent years investing in SAP HANA and many others are looking at adopting SAP HANA and SAP Predictive Analytics. Both solutions together can bring enormous value in terms of real time automated predictive capabilities.

Related Content http://help.sap.com/pa - Latest SAP Predictive Analytics Installation & User Guides http://scn.sap.com/community/predictive-analytics - Latest information from the SAP Predictive Analytics

community

Blog-APL - Example Scenario: SAP Predictive Analytics, HANA APL (Automated Predictive libraries): Classification

Page 42: A Beginner's Guide to Get Started With SAP Predictive ...

42

Copyright

© 2015 SAP SE or an SAP SE affiliate company. All rights reserved.

No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP SE. The information contained herein may be changed without prior notice.

Some software products marketed by SAP SE and its distributors contain proprietary software components of other software vendors. National product specifications may vary.

These materials are provided by SAP SE and its affiliated companies (“SAP SE Group”) for informational purposes only, without representation or warranty of any kind, and SAP SE Group shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP SE Group products and services are those that are set forth in the express warranty statements accompanying such products and services, if any. Nothing herein should be construed as constituting an additional warranty.

SAP SE and other SAP SE products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP SE in Germany and other countries.

Please see http://www.sap.com/corporate-en/legal/copyright/index.epx#trademark for additional trademark information and notices.