2017 OSIsoft TechCon Apply Data Science and Machine Learning … · 2019-07-12 · At TechCon 2016,...

63
2017 OSIsoft TechCon Apply Data Science and Machine Learning to PI System Data for Predictive Analytics

Transcript of 2017 OSIsoft TechCon Apply Data Science and Machine Learning … · 2019-07-12 · At TechCon 2016,...

Page 1: 2017 OSIsoft TechCon Apply Data Science and Machine Learning … · 2019-07-12 · At TechCon 2016, we reviewed an end-to-end use case for developing a machine learning (multivariate

2017 OSIsoft TechCon

Apply Data Science and Machine Learning

to PI System Data for Predictive Analytics

Page 2: 2017 OSIsoft TechCon Apply Data Science and Machine Learning … · 2019-07-12 · At TechCon 2016, we reviewed an end-to-end use case for developing a machine learning (multivariate

Apply Data Science and Machine Learning to PI System Data for Predictive Analytics

2 | P a g e

OSIsoft, LLC 777 Davis St., Suite 250 San Leandro, CA 94577 USA Tel: (01) 510-297-5800 Web: http://www.osisoft.com © 2017 by OSIsoft, LLC. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, mechanical, photocopying, recording, or otherwise, without the prior written permission of OSIsoft, LLC. OSIsoft, the OSIsoft logo and logotype, PI Analytics, PI ProcessBook, PI DataLink, ProcessPoint, PI Asset Framework (PI AF), IT Monitor, MCN Health Monitor, PI System, PI ActiveView, PI ACE, PI AlarmView, PI BatchView, PI Coresight, PI Data Services, PI Event Frames, PI Manual Logger, PI ProfileView, PI WebParts, ProTRAQ, RLINK, RtAnalytics, RtBaseline, RtPortal, RtPM, RtReports and RtWebParts are all trademarks of OSIsoft, LLC. All other trademarks or trade names used herein are the property of their respective owners. U.S. GOVERNMENT RIGHTS Use, duplication or disclosure by the U.S. Government is subject to restrictions set forth in the OSIsoft, LLC license agreement and as provided in DFARS 227.7202, DFARS 252.227-7013, FAR 12.212, FAR 52.227, as applicable. OSIsoft, LLC. Published: April 26, 2017

Apply Data Science and Machine Learning to PI System data for Predictive Analytics

Hands-on Lab – OSIsoft TechCon 2017

Lead: Gopal GopalKrishnan, P.E., Solution Architect

Lead: Curt Hertler, Solution Architect

Instructor: Yvonne Radsmikham, CSS Engineer

Instructor: Erica Trump, Instructional System Designer

Page 3: 2017 OSIsoft TechCon Apply Data Science and Machine Learning … · 2019-07-12 · At TechCon 2016, we reviewed an end-to-end use case for developing a machine learning (multivariate

PI System software components

3 | P a g e

Table of Contents

Lab Description and Overview ...................................................................................................... 4

Lab Description ........................................................................................................ 4

Overview ................................................................................................................... 4

Learning Objectives .............................................................................................. 10

PI System software components ................................................................................................ 11

Part I – Explore AHU Operations and Extract Data using PI Integrator .................................. 12

Explore air handler operations data .................................................................... 12

Extract data using PI Integrator ........................................................................... 14

Part II – Dimensionality Reduction for Multivariate Data with Principal Component Analysis using R .................................................................................................................... 21

Part III – SVM – Support Vector Machine ................................................................................... 36

Azure Machine Learning Studio - by Microsoft .................................................. 36

Azure ML Studio Login.......................................................................................... 36

Import dataset ........................................................................................................ 37

Part IV – Deploying the Model and Interactive Display ............................................................ 52

Windows PowerShell Scripting ............................................................................ 53

Azure ML web service details ............................................................................... 53

PowerShell scripts................................................................................................. 54

Testing AHU Fault Status prediction using PowerShell script ......................... 55

Windows Task Scheduler ..................................................................................... 56

PI Integrator – Continuous Publishing ................................................................ 57

Shiny – by RStudio – a Web Application Framework for R ............................... 58

Appendix A: Using PI Web API with R ...................................................................................... 62

Reference Materials ...................................................................................................................... 63

Page 4: 2017 OSIsoft TechCon Apply Data Science and Machine Learning … · 2019-07-12 · At TechCon 2016, we reviewed an end-to-end use case for developing a machine learning (multivariate

Apply Data Science and Machine Learning to PI System Data for Predictive Analytics

4 | P a g e

Lab Description and Overview

Lab Description

At TechCon 2016, we reviewed an end-to-end use case for developing a machine learning (multivariate PCA - principal component analysis) model to predict equipment failure. This lab builds on those concepts but we now use data from a process unit operation and apply data science and machine learning methods for diagnostics. You will learn how to use time-series sensor data with free trial versions or open source libraries such as R and others, to build data-driven models and deploy them for real-time analytics.

Attendees are expected to be familiar with the full PI System stack. Programming skills are not required.

Level: 300 Duration: 3 hours

Overview

Troubleshooting faulty processes and equipments – also known as FDD (fault detection and diagnostics)

or anomaly detection is a challenge. This hands-on-lab provides an end-to-end walk-through for

applying data driven techniques - specifically machine learning - for such tasks.

The learning objectives of this lab include:

▪ Extracting data from the PI System using PI Integrator

▪ Using the PI System data with R, data cleansing, feature selection, model development for a

multivariate process using PCA (principal component), etc.

▪ Using the PCA model with Shiny https://shiny.rstudio.com/ to create an interactive display for

visualizing and exploring faults vs. normal operation; also using SVM (support vector machine)

for classification and prediction of Air Handler (AHU) fault/no-fault state

▪ Using Azure ML with PI System data for machine learning

▪ Deploying the machine learning model for continuous execution with real-time data

▪ Understanding the end-to-end data science process – data retrieval, data cleansing, shaping and

preparation with meta-data context, feature selection via domain specific guidelines, applying

machine learning methods, visualizing the results and operationalizing the findings

The application of data science and machine learning methods are well known in several fields – image

and speech recognition, fraud detection, search, shopping recommendations, and others. In

manufacturing, including manufacturing operations management, and particularly in plant-floor

operations with time-series sensor data, select data science/machine learning methods are highly

effective.

Page 5: 2017 OSIsoft TechCon Apply Data Science and Machine Learning … · 2019-07-12 · At TechCon 2016, we reviewed an end-to-end use case for developing a machine learning (multivariate

PI System software components

5 | P a g e

Principal Component Analysis (PCA) is one such well-known and established machine learning technique

for gaining insights from multivariate process operations data. PCA has several use cases – exploratory

analysis, feature reconstruction, outlier detection, and others. And, other derived algorithms such as PLS

(projection to latent structures), O-PLS (orthogonal …), PLS-DA (… discriminant analysis) etc. are widely

used in the industry.

In a multivariate process, several parameters - sometimes just a handful but often dozens of parameters

- vary simultaneously, resulting in multi-dimensional datasets that are difficult to visualize and analyze.

Examples of multivariate processes are:

▪ Brewery - Beer fermentation

▪ Oil Refinery – Distillation column

▪ Facilities – Heating, Ventilation and Air-Conditioning (HVAC) - Air Handler Unit

▪ …

▪ …

Figure 1 illustrates how a multivariate (bivariate in the figure below) view allows you to quickly detect

outliers even though the individual variables may still be within control limits.

Figure 1 Multivariate view of process variables (ref. http://www.isixsigma.com)

In this lab, we use the Air Handler Unit (AHU) to illustrate an approach for analyzing such multivariate

processes. A typical HVAC system with AHU is shown below.

Page 6: 2017 OSIsoft TechCon Apply Data Science and Machine Learning … · 2019-07-12 · At TechCon 2016, we reviewed an end-to-end use case for developing a machine learning (multivariate

Apply Data Science and Machine Learning to PI System Data for Predictive Analytics

6 | P a g e

Figure 2 HVAC system with Air Handling Unit (AHU)

Sensor data available from the AHU, as part of the BMS (building management system) are:

▪ Outside air temperature

▪ Relative Humidity

▪ Mixed air temperature

▪ Supply air temperature

▪ Damper position

▪ Chilled water flow

▪ Supply air flow

▪ Supply air fan VFD (variable frequency drive) power

▪ …

▪ …

During the course of a day, the AHU operating conditions change continuously as the outside air

temperature rises and falls, along with changing relative humidity, changing thermostat set-points,

building occupancy level, and others. The BMS control system adjusts the supply air flow rate, chilled

water flow rate, damper position etc. to provide the necessary heating or cooling to the rooms to ensure

tenant comfort.

Page 7: 2017 OSIsoft TechCon Apply Data Science and Machine Learning … · 2019-07-12 · At TechCon 2016, we reviewed an end-to-end use case for developing a machine learning (multivariate

PI System software components

7 | P a g e

However, fault conditions such as incorrect/drifting sensor measurements (temperature, flow, pressure

…), dampers stuck at open/closed/partial-open position, stuck chilled water valve, and others, can waste

energy, or lead to tenant complaints from other malfunctions causing rooms to get too hot or too cold.

For troubleshooting and diagnostics, HVAC engineers need tools to answer questions such as:

o How can I use data to detect faulty AHU operations i.e. air damper stuck open at 100% open on

a hot day in mid-July?

o What’s the AHU “state” during 100 ºF + days? In 2016? In 2015? And, in 2014 before we

installed the Economizer?

o What are the AHU outlier/extreme operating states?

o How did it get to the extreme state; what were the immediate prior operating states for that

day?

o What’s the AHU state at supply fan flow limit constraint? When did it happen?

o …

o …

Figure 1 shows an interactive web page to visualize AHU operations data at 1 hour resolution. The

multidimensional AHU data is projected to the 2-d plotting surface using PCA via R https://cran.r-

project.org/ and the web page has been authored using Shiny.

URL for the interactive webpage is https://gopalosi.shinyapps.io/shiny/

Page 8: 2017 OSIsoft TechCon Apply Data Science and Machine Learning … · 2019-07-12 · At TechCon 2016, we reviewed an end-to-end use case for developing a machine learning (multivariate

Apply Data Science and Machine Learning to PI System Data for Predictive Analytics

8 | P a g e

f

Figure 3 Air Handler Unit - Operations display – each point or bubble shows hourly AHU “state” for 2016

➢ x-axis=principal component 1, y-axis= principal component 2

➢ Size of the bubble corresponds to the Count of an operational state, i.e. how often is the AHU at

an operational state

➢ Color of the bubble indicates outside air temperature (OAT) - ranges from 45 ºF to 110 ºF. Colder

days (orange bubbles) are on the far right and towards the top and hotter days (pink bubbles)

are on the far left and towards the top

➢ Select OAT (outside air temperature) to view all AHU operations at a specified value; in the plot

OAT is set to 100 ºF and the corresponding operating states are shown as black circles

➢ Select SA (supply air flow, cfm) to view all AHU operations at specified value; black circles show

for the selected value (this is not shown in the plot)

➢ Select a region using a brushing action (blue shaded rectangle in the plot) to see corresponding

AHU operations data in the Table. The plot shows an outlier i.e. extreme operating point in the

blue shaded rectangle – and the details are shown in the table as 5/18/2016 17:00 data

➢ Select a date – the black dotted line shows the progression of the AHU operational state (8am to

8pm) in the direction of the arrow. Detailed hour-by-hour operations data is displayed in the

table below the plot

➢ Anomalies or faulted AHU states, if any, are indicated with red circles

Page 9: 2017 OSIsoft TechCon Apply Data Science and Machine Learning … · 2019-07-12 · At TechCon 2016, we reviewed an end-to-end use case for developing a machine learning (multivariate

PI System software components

9 | P a g e

The interactive web display can be used as a troubleshooting tool to investigate AHU operating states.

Figure 3 below shows the AHU PCA plot and it helps to interpret the effect of each original variable

(Outside Air Temperature, Supply Air Flow, Cooling Coil Output etc.) as projected on a 2D surface and as

it applies to the AHU “state.”

Figure 4 AHU PCA score scatter plot – labels (such as 9506, 9507 etc.) are observation IDs and each ID corresponds to 10 minutes of AHU operation

After the AHU operational states have been validated with several months (or years) of historical

performance data, the machine learning model is deployed for real-time monitoring. Operating states

showing sudden or unusual changes, or operating states outside “known operating envelope” etc. are

flagged for fault-investigation.

Visual analysis of the various AHU operational states also helps you to set data-driven rules (instead of

just using best practice or conventional wisdom) for fault detection.

Page 10: 2017 OSIsoft TechCon Apply Data Science and Machine Learning … · 2019-07-12 · At TechCon 2016, we reviewed an end-to-end use case for developing a machine learning (multivariate

Apply Data Science and Machine Learning to PI System Data for Predictive Analytics

10 | P a g e

However, you don’t have to be “watching the plot” to be notified of faults. A second machine learning

technique that can be layered on to the base PCA model is SVM (support vector machine), and

specifically, one-class SVM. SVM classification can be used to identify fault/no-fault AHU states.

However, when you have no or insufficient data for “faulty state”, the one-class SVM is applied.

One-class SVM – also called SVDD (support vector data description) – is used along with the reduced

dimension dataset from the PCA model for visualization.

Learning Objectives

As stated earlier, the learning objectives of this lab include:

▪ Extracting data from the PI System using PI Integrator

▪ Using the PI System data in R, data cleansing, feature selection, model development etc.

▪ Using the PCA (principal component) model with Shiny https://shiny.rstudio.com/ to create an

interactive display for visual analytics; also using SVM (support vector machine) for classification

and prediction of AHU fault/no-fault state

▪ Using Azure ML with PI System data for machine learning

▪ Deploying the machine learning model for continuous execution with real-time data

▪ Understanding the end-to-end data science process – data retrieval, data cleansing, shaping and

preparation with meta-data context, feature selection via domain specific guidelines, applying

machine learning methods, visualizing the results and operationalizing the findings

Page 11: 2017 OSIsoft TechCon Apply Data Science and Machine Learning … · 2019-07-12 · At TechCon 2016, we reviewed an end-to-end use case for developing a machine learning (multivariate

PI System software components

11 | P a g e

PI System software components

The VM (virtual machine) used for this lab has the following PI System software components installed:

Software Version

PI Data Archive 2016 R2

PI Asset Framework (PI AF) server 2017 (pre-release)

PI Asset Framework (PI AF) client (PI System Explorer) 2017 (pre-release)

PI Analysis & PI Notifications Services 2017 (pre-release)

PI Coresight 2017 (pre-release)

PI Web API 2017 (pre-release)

PI Integrator for Business Analytics (DW Edition) 2017

For details on PI System software, please see: http://www.osisoft.com/pi-system/pi-

capabilities/product-list/

For PI Integrator for Business Analytics, please see:

http://www.osisoft.com/corporate/business-analytics/

Open source software installed are:

▪ R 3.3.2 https://cran.r-project.org/

▪ RStudio 1.0.136 https://www.rstudio.com/

▪ Shiny 1.0.0 https://shiny.rstudio.com/

Page 12: 2017 OSIsoft TechCon Apply Data Science and Machine Learning … · 2019-07-12 · At TechCon 2016, we reviewed an end-to-end use case for developing a machine learning (multivariate

Apply Data Science and Machine Learning to PI System Data for Predictive Analytics

12 | P a g e

Part I – Explore AHU Operations and Extract Data

using PI Integrator

Explore air handler operations data

First, let us examine the available sensor data. With PI System Explorer, navigate to element AHU03.

Note the Attributes under ML category.

Page 13: 2017 OSIsoft TechCon Apply Data Science and Machine Learning … · 2019-07-12 · At TechCon 2016, we reviewed an end-to-end use case for developing a machine learning (multivariate

Part I – Explore AHU Operations and Extract Data using PI Integrator

13 | P a g e

Open the Coresight display using the shortcut icon in your desktop.

Take a few minutes to examine the AHU operations in terms of the attributes listed in the ML category.

Note that AHU data has been back-filled for only the ML Category attributes for the period 01-Mar-2016

to 31-Oct-2016.

Discussion items:

▪ When is the AHU running? What hours of the day?

▪ What days of the week is the AHU running?

▪ What do you need to know about the data as you are preparing to explore it?

▪ …

▪ …

▪ …

Page 14: 2017 OSIsoft TechCon Apply Data Science and Machine Learning … · 2019-07-12 · At TechCon 2016, we reviewed an end-to-end use case for developing a machine learning (multivariate

Apply Data Science and Machine Learning to PI System Data for Predictive Analytics

14 | P a g e

Extract data using PI Integrator

PI Integrator for Business Analytics (Data Warehouse edition) is a tool developed by OSIsoft to support the integration of real-time operations data with various environments including:

- Self-service BI Analytic tools like Microsoft Power BI, Tableau or Tibco Spotfire - Data Warehouse databases like Teradata, Oracle or Microsoft SQL Server - Big Data platforms like SAP HANA and Hadoop - Machine learning and statistical analytics tools like SAS and R

In this lab, we will be working with R. Hence, we will use the PI Integrator for Data Warehouse to organize the dataset and publish data to a text file format that can be imported into R.

If you prefer to skip the hands-on portion and the rest of Part I, a previously published dataset using PI

Integrator is available as Student01_20170227174300.txt, It can be used in Part II and for other parts of

this lab without loss of continuity.

1. Access the PI Integrator by opening Internet Explorer. The default web site for the PI Integrator is https://pisrv01:444/. You should see the main page of the integrator, showing a list of previously configured views:

To get started, select “Create Asset View” top menu.

2. Next, you are prompted to give your Asset View a name. Enter a name and click Create View.

Page 15: 2017 OSIsoft TechCon Apply Data Science and Machine Learning … · 2019-07-12 · At TechCon 2016, we reviewed an end-to-end use case for developing a machine learning (multivariate

Part I – Explore AHU Operations and Extract Data using PI Integrator

15 | P a g e

3. Select Server PISRV01 and Database UC2017. The Building element appears in the AF hierarchy.

Click on Building element and drill-down the hierarchy to locate AHU03.

When you select an asset, the Attributes pane opens below the hierarchy to show the selected

element’s attributes.

Page 16: 2017 OSIsoft TechCon Apply Data Science and Machine Learning … · 2019-07-12 · At TechCon 2016, we reviewed an end-to-end use case for developing a machine learning (multivariate

Apply Data Science and Machine Learning to PI System Data for Predictive Analytics

16 | P a g e

4. Next, you select the attributes to be included in your Asset View. To do this, group the attributes

by category.

Page 17: 2017 OSIsoft TechCon Apply Data Science and Machine Learning … · 2019-07-12 · At TechCon 2016, we reviewed an end-to-end use case for developing a machine learning (multivariate

Part I – Explore AHU Operations and Extract Data using PI Integrator

17 | P a g e

5. Drag and drop the “ML” category into the middle (Asset Shape) pane; this will populate the pane

with all the attributes in that category for element AHU03.

6. Remove “FaultStatus_Calc” using the “X” icon. FaultStatus_Calc is a calculated variable and is

not part of the sensor measurements.

Click Next button in the upper right corner.

Page 18: 2017 OSIsoft TechCon Apply Data Science and Machine Learning … · 2019-07-12 · At TechCon 2016, we reviewed an end-to-end use case for developing a machine learning (multivariate

Apply Data Science and Machine Learning to PI System Data for Predictive Analytics

18 | P a g e

7. The AHU03 data is backfilled for the time between 1-Mar-2016 to 31-Oct-2016. Hence, change

the start and end time accordingly and click Apply.

8. Using the “Edit Value Mode” dialog, you have the option to change how often the data is

sampled from the PI System. The default selection is to interpolate values every minute.

The data are backfilled at 10 minute intervals, so we can change the “Sample values every” to 10

minutes and click Save Changes.

Page 19: 2017 OSIsoft TechCon Apply Data Science and Machine Learning … · 2019-07-12 · At TechCon 2016, we reviewed an end-to-end use case for developing a machine learning (multivariate

Part I – Explore AHU Operations and Extract Data using PI Integrator

19 | P a g e

9. Use the Add Column to add time related items and click Display 4 time columns.

Confirm that you see the time based columns that you just added and then Click Next in the top

right corner.

10. Select Text File as Target Configuration, and ensure that Run Once is selected.

Click on Publish for the PI Integrator to start publishing the dataset to a file. You will have to Confirm this.

Page 20: 2017 OSIsoft TechCon Apply Data Science and Machine Learning … · 2019-07-12 · At TechCon 2016, we reviewed an end-to-end use case for developing a machine learning (multivariate

Apply Data Science and Machine Learning to PI System Data for Predictive Analytics

20 | P a g e

You will then be directed back to the page showing the list of Views. The bottom of the page shows a run status of the publish action.

11. Once the publication is finished, open the lab folder (C:\users\documents\student01\DataScienceDevLab\AHU). You will see the text file you have just published. It will be named with your Asset View name and a time stamp.

Previously published dataset using PI Integrator is available as Student01_20170227174300.txt, It can be

used in the next Part of this lab without loss of continuity.

In the next section, we will use this file as input to the R script for analysis.

Page 21: 2017 OSIsoft TechCon Apply Data Science and Machine Learning … · 2019-07-12 · At TechCon 2016, we reviewed an end-to-end use case for developing a machine learning (multivariate

Part II – Dimensionality Reduction for Multivariate Data with Principal Component Analysis using R

21 | P a g e

Part II – Dimensionality Reduction for Multivariate

Data with Principal Component Analysis using R

From the DataScienceDevLab\AHU folder, select AHULab.R and double-click to open it in

R Studio (please be patient, R Studio takes a few seconds to open).

The screen below shows the R Studio user interface.

Page 22: 2017 OSIsoft TechCon Apply Data Science and Machine Learning … · 2019-07-12 · At TechCon 2016, we reviewed an end-to-end use case for developing a machine learning (multivariate

Apply Data Science and Machine Learning to PI System Data for Predictive Analytics

22 | P a g e

The following sections show the output when you step through the script line by line using

Run.

The pages below have been extracted from the script output document AHU.html (see the lab

folder for the latest revision).

Page 23: 2017 OSIsoft TechCon Apply Data Science and Machine Learning … · 2019-07-12 · At TechCon 2016, we reviewed an end-to-end use case for developing a machine learning (multivariate

Part II – Dimensionality Reduction for Multivariate Data with Principal Component Analysis using R

23 | P a g e

Page 24: 2017 OSIsoft TechCon Apply Data Science and Machine Learning … · 2019-07-12 · At TechCon 2016, we reviewed an end-to-end use case for developing a machine learning (multivariate

Apply Data Science and Machine Learning to PI System Data for Predictive Analytics

24 | P a g e

Page 25: 2017 OSIsoft TechCon Apply Data Science and Machine Learning … · 2019-07-12 · At TechCon 2016, we reviewed an end-to-end use case for developing a machine learning (multivariate

Part II – Dimensionality Reduction for Multivariate Data with Principal Component Analysis using R

25 | P a g e

Page 26: 2017 OSIsoft TechCon Apply Data Science and Machine Learning … · 2019-07-12 · At TechCon 2016, we reviewed an end-to-end use case for developing a machine learning (multivariate

Apply Data Science and Machine Learning to PI System Data for Predictive Analytics

26 | P a g e

Page 27: 2017 OSIsoft TechCon Apply Data Science and Machine Learning … · 2019-07-12 · At TechCon 2016, we reviewed an end-to-end use case for developing a machine learning (multivariate

Part II – Dimensionality Reduction for Multivariate Data with Principal Component Analysis using R

27 | P a g e

Page 28: 2017 OSIsoft TechCon Apply Data Science and Machine Learning … · 2019-07-12 · At TechCon 2016, we reviewed an end-to-end use case for developing a machine learning (multivariate

Apply Data Science and Machine Learning to PI System Data for Predictive Analytics

28 | P a g e

Page 29: 2017 OSIsoft TechCon Apply Data Science and Machine Learning … · 2019-07-12 · At TechCon 2016, we reviewed an end-to-end use case for developing a machine learning (multivariate

Part II – Dimensionality Reduction for Multivariate Data with Principal Component Analysis using R

29 | P a g e

Page 30: 2017 OSIsoft TechCon Apply Data Science and Machine Learning … · 2019-07-12 · At TechCon 2016, we reviewed an end-to-end use case for developing a machine learning (multivariate

Apply Data Science and Machine Learning to PI System Data for Predictive Analytics

30 | P a g e

Page 31: 2017 OSIsoft TechCon Apply Data Science and Machine Learning … · 2019-07-12 · At TechCon 2016, we reviewed an end-to-end use case for developing a machine learning (multivariate

Part II – Dimensionality Reduction for Multivariate Data with Principal Component Analysis using R

31 | P a g e

Page 32: 2017 OSIsoft TechCon Apply Data Science and Machine Learning … · 2019-07-12 · At TechCon 2016, we reviewed an end-to-end use case for developing a machine learning (multivariate

Apply Data Science and Machine Learning to PI System Data for Predictive Analytics

32 | P a g e

Page 33: 2017 OSIsoft TechCon Apply Data Science and Machine Learning … · 2019-07-12 · At TechCon 2016, we reviewed an end-to-end use case for developing a machine learning (multivariate

Part II – Dimensionality Reduction for Multivariate Data with Principal Component Analysis using R

33 | P a g e

Page 34: 2017 OSIsoft TechCon Apply Data Science and Machine Learning … · 2019-07-12 · At TechCon 2016, we reviewed an end-to-end use case for developing a machine learning (multivariate

Apply Data Science and Machine Learning to PI System Data for Predictive Analytics

34 | P a g e

Page 35: 2017 OSIsoft TechCon Apply Data Science and Machine Learning … · 2019-07-12 · At TechCon 2016, we reviewed an end-to-end use case for developing a machine learning (multivariate

Part II – Dimensionality Reduction for Multivariate Data with Principal Component Analysis using R

35 | P a g e

We will use the above display from R in the next two Parts.

To detect outliers and faulty operations, you can “score” new data using the pc1eq and pc2eq

equations shown above and overlay new “scored” data against a backdrop of historical AHU

operational scores.

Outliers are determined using Hotelling-T2 values (i.e. distance from the model and also

known as DModX) and those exceeding a defined upper limit - say 95% - are shown as points

with red circles.

Open source R does not include deployment services for continuous and unattended execution for scoring

new data. However, Azure ML provides a way to deploy R based models for scoring.

Azure ML also allows you to develop models using several machine learning algorithms.

In the next Part, we will review Azure ML and use it with the R based visuals that we

developed earlier.

In the next Part, we will also review a different anomaly detection technique called SVM

(support vector machine). In particular, we will use the one-class classification since we are

doing exploratory analysis and our dataset is treated to be in a single NORMAL class, and

without any data pre-identified as FAULT.

Page 36: 2017 OSIsoft TechCon Apply Data Science and Machine Learning … · 2019-07-12 · At TechCon 2016, we reviewed an end-to-end use case for developing a machine learning (multivariate

Apply Data Science and Machine Learning to PI System Data for Predictive Analytics

36 | P a g e

Part III – SVM – Support Vector Machine

Azure Machine Learning Studio - by Microsoft

In this portion of the lab, we will use Azure ML services to develop and deploy a model using

machine learning for the AHU operations data.

Azure Machine Learning Studio is an interactive, drag-and-drop workspace that allows you to build, test, and deploy models. We will take a step-by-step approach to give you a feel for

the basics of using Azure ML with PI System data.

Each step involves selecting and configuring one or more Azure ML module(s), as below:

• Import Dataset- use saved datasets that were uploaded to Azure Machine Learning Studio

• Project Columns - select the columns of the table to include in the predictive model, e.g.

Outside Air Temperature, Supply Air Flow, Return Air Temperature.

• Split - divide the data into two sets, the first one for training, called Training Dataset, and the

second one for test, called the Test Dataset. The modelling process works best when you train,

or fit, the predictive model to a portion of the dataset. In this way, the trained model can be

tested against the remaining portion of the dataset to see how well it performs against data

which it has not yet “seen”.

• Train Model - connects the training dataset with a chosen machine learning algorithm such as

“PCA”, “SVM” and others.

• Score Model - test the model against the training and/or test Dataset.

• Convert to CSV- converts data input to CSV (comma-separated values) format to re-import to

other sources

• R Script Execution- use R programming to manipulate and better visualize the model output

data

If you prefer to skip the hands-on portion in this Part, a fully completed model is available in your assigned workspace; it is titled “AHU SVM Final”.

Azure ML Studio Login

Use https://studio.azureml.net/ and login to Microsoft Azure Machine Learning Studio.

Please use the following credentials.

Your Azure ML account name and login will use the last two digits of your assigned VM, i.e.

“3926vlecs1.cloudapp.net:60001” translates to [email protected].

Then, when requested, login using: User name: osi\student26 Password: Maythe4thBeWithYou!

Page 37: 2017 OSIsoft TechCon Apply Data Science and Machine Learning … · 2019-07-12 · At TechCon 2016, we reviewed an end-to-end use case for developing a machine learning (multivariate

Part III – SVM – Support Vector Machine

37 | P a g e

Import dataset

Select “DATASETS”, and on the bottom left corner, click “NEW”.

Select “FROM LOCAL FILE”

Use the “Browse” button and select the exported text file from PI Integrator (in the

DataScienceDevLab\AHU folder).

The exported file from PI Integrator will be “Student01…”. However, when you upload to AzureML,

please rename it in the above dialog as “AHU student26…” if your VM number is

3926vlecs1.cloudapp.net

Page 38: 2017 OSIsoft TechCon Apply Data Science and Machine Learning … · 2019-07-12 · At TechCon 2016, we reviewed an end-to-end use case for developing a machine learning (multivariate

Apply Data Science and Machine Learning to PI System Data for Predictive Analytics

38 | P a g e

After it is uploaded, confirm that it is listed.

Building the Experiment in Azure Machine Learning Studio

1. Select “EXPERIMENTS”, and on the bottom left corner, click “NEW” and then select “Blank

Experiment”.

Page 39: 2017 OSIsoft TechCon Apply Data Science and Machine Learning … · 2019-07-12 · At TechCon 2016, we reviewed an end-to-end use case for developing a machine learning (multivariate

Part III – SVM – Support Vector Machine

39 | P a g e

Each Azure ML Workspace is shared by four students. A workspace can host many experiments.

To avoid name conflicts, please use “AHU studentxx” for your experiment, where xx indicates the last 2 digits in the number portion of your machine – for example, 3926vlecs1.cloudapp.net will use 26 and name the experiment as “AHU student26”.

2. Experiment Title - Give the experiment a title by editing the Title Area at the top of the Design

Canvas.

3. Navigate to “Saved Datasets”, and then “My Datasets”. Drag and drop the dataset you uploaded

to the experiment workspace.

If you did not export a file from PI Integrator, you can use “student01_20170227174300.txt”

Page 40: 2017 OSIsoft TechCon Apply Data Science and Machine Learning … · 2019-07-12 · At TechCon 2016, we reviewed an end-to-end use case for developing a machine learning (multivariate

Apply Data Science and Machine Learning to PI System Data for Predictive Analytics

40 | P a g e

Right-click on the connector at the bottom edge and select Visualize. Confirm that the number

of rows match those in the dataset. Also, confirm that the various columns that you exported

from PI Integrator are included in the dataset.

Following the same steps as above, upload the pc.csv file that you “saved” using the R script in

Part II.

Please prefix “studentXX” to file names when uploading to Azure ML

If you skipped the export during the R portion of the lab, please use “AHU PC FINAL.csv”

4. In “Data Transformation” tab, expand on “Manipulation” and drag, drop the “Apply SQL

Transformation”, and connect the dataset module to it. In the SQL Query script for the “Apply

SQL Transformation” module, enter the SQL query as below:

Select * from t1 where ([Day of the Week]!='Sunday' AND [Day of the Week]!='Saturday') AND ([Hour] >=8 and [Hour] <=20)

Page 41: 2017 OSIsoft TechCon Apply Data Science and Machine Learning … · 2019-07-12 · At TechCon 2016, we reviewed an end-to-end use case for developing a machine learning (multivariate

Part III – SVM – Support Vector Machine

41 | P a g e

You can copy/paste from SQLScripts.txt file in the SQLScripts folder

This query removes rows corresponding to weekends, and also removes rows before 8am and

after 8pm, i.e. we select data that represents the core AHU operating hours.

5. Insert “Clean Missing Data” module from “Data Transformation” > “Manipulation”. Connect the

module to the existing “Apply SQL Transformation” module.

Use all the default properties for the module, except for the following:

Cleaning mode: Remove entire row

Page 42: 2017 OSIsoft TechCon Apply Data Science and Machine Learning … · 2019-07-12 · At TechCon 2016, we reviewed an end-to-end use case for developing a machine learning (multivariate

Apply Data Science and Machine Learning to PI System Data for Predictive Analytics

42 | P a g e

From Part II, we know that this dataset (with approx. 13,000 rows) contains about 10 rows with

null values for certain variables. The “Clean Missing Data” operation removes such rows.

6. Add “Select Columns in Dataset” module from “Manipulation” tab under “Data Transformation”.

On the right panel, click on “Launch column selector”, and you will get a pop-up window to

select the appropriate columns.

If columns are not listed in the drop-down, click on “ALL COLUMNS” and then on “NO COLUMNS”

Select the following columns to be included in the “Selected Columns”:

▪ Cooling Coil Output

Page 43: 2017 OSIsoft TechCon Apply Data Science and Machine Learning … · 2019-07-12 · At TechCon 2016, we reviewed an end-to-end use case for developing a machine learning (multivariate

Part III – SVM – Support Vector Machine

43 | P a g e

▪ Mixed Air Output ▪ Mixed Air Temperature ▪ Outside Air Temperature ▪ Return Air Temperature ▪ Supply Air Temperature ▪ Total Supply Air Flow

Click on the check mark to accept the selected columns.

7. Insert “Split Data” module from “Data Transformation” > “Sample and Split”. Connect the

module to the existing “Apply SQL Transformation” module. Configure the following properties

for the module:

Splitting Mode: Split Rows Fraction of rows in the first output dataset: 0.7 Randomized: Checked Random Seed: 314

Page 44: 2017 OSIsoft TechCon Apply Data Science and Machine Learning … · 2019-07-12 · At TechCon 2016, we reviewed an end-to-end use case for developing a machine learning (multivariate

Apply Data Science and Machine Learning to PI System Data for Predictive Analytics

44 | P a g e

Stratified Split: False

8. Add the “One-Class Support Vector Machine” module from “Machine Learning” > “Initialize

Model”> “Anomaly Detection”. Set the Properties as below:

η= 0.01 (parameter corresponds to the trade-off between outliers and normal values)

ε= 0.1 (parameter is the stopping tolerance. The stopping tolerance, affects the number of

iterations used when optimizing the model)

Page 45: 2017 OSIsoft TechCon Apply Data Science and Machine Learning … · 2019-07-12 · At TechCon 2016, we reviewed an end-to-end use case for developing a machine learning (multivariate

Part III – SVM – Support Vector Machine

45 | P a g e

Add the “Train Anomaly Detection Model” module from “Machine Learning” > “Train”> “Train

Anomaly Detection Model”.

Connect “Split Data” to ”Train Anomaly Detection Machine”

Connect “One-Class Support Vector Machine” to ”Train Anomaly Detection Machine”

9. Add two “Score Model” modules to the workspace and connect them as shown below.

Be sure to check the “Append score columns to output” in the Properties Panel.

Page 46: 2017 OSIsoft TechCon Apply Data Science and Machine Learning … · 2019-07-12 · At TechCon 2016, we reviewed an end-to-end use case for developing a machine learning (multivariate

Apply Data Science and Machine Learning to PI System Data for Predictive Analytics

46 | P a g e

“Run” the Experiment using the menu bar at the bottom of the canvas.

After it says “Finished Running”, use Visualize to view the scores.

Page 47: 2017 OSIsoft TechCon Apply Data Science and Machine Learning … · 2019-07-12 · At TechCon 2016, we reviewed an end-to-end use case for developing a machine learning (multivariate

Part III – SVM – Support Vector Machine

47 | P a g e

Observe that the “Scored Labels” when plotted with “Scored Probabilities” indicate that a few

data points have “Scored Labels”=1, meaning they are outliers.

However, it is difficult visualize the “outliers” and compare them to “normal” AHU operations.

For better visualization, we will use Azure ML’s integration with R and reuse the scripts and plots

that we developed in Part II.

In the next step i.e. Step 10 of the Experiment, we prepare the data required for visualization – we need

PC1 and PC2 scores (from the PCA model in R) for the 2D plot and the “Scored Labels” from the

AzureML model.

10. As shown in the picture below, please do the following:

▪ Remove the connection between “Split Data” and left “Score Model” and add a

connection between “Select Columns in Dataset” and left “Score Model”

(dotted red line shows the connection to be removed, and the solid red line is the

connection to be made)

▪ Add the pc.csv dataset (or you may have named it “AHU studenXX pc.csv“) dataset to

the Experiment

If you did not upload a file for this dataset, you can use AHU PC Final.csv

Page 48: 2017 OSIsoft TechCon Apply Data Science and Machine Learning … · 2019-07-12 · At TechCon 2016, we reviewed an end-to-end use case for developing a machine learning (multivariate

Apply Data Science and Machine Learning to PI System Data for Predictive Analytics

48 | P a g e

▪ Add “Select Columns in Dataset” and in the properties pane, select columns “Scored

Labels” and “Scored Probabilities”

▪ Add “Add Columns”

▪ Add “Convert to CSV”

▪ Add “Execute R Script”

As mentioned earlier, the modules in the blue ellipse are for preparing the dataset and the R script for

the visualization.

In the Execute R Script module, use the Script shown below:

dataset1 <- maml.mapInputPort(1) # class: data.frame library(ggplot2) c=dataset1 c$date=strptime(c$TimeStamp,"%Y-%m-%d") c1=subset(c,c$'Minute'==0) oat=round(c1$'Outside Air Temperature'/5)*5 oat=as.factor(oat) p=ggplot(data=c1)+aes(x=pc1a,y=pc2a,color=as.factor(oat)) + geom_point() #+stat_sum() #geom_count() # p+ geom_point(data=subset(c1,c1$"Scored Labels"==1), color='red',size=4,shape=1) print(p)

Please see “ExecuteRScript.txt” in “RScripts in AzureML” folder

Page 49: 2017 OSIsoft TechCon Apply Data Science and Machine Learning … · 2019-07-12 · At TechCon 2016, we reviewed an end-to-end use case for developing a machine learning (multivariate

Part III – SVM – Support Vector Machine

49 | P a g e

From the menu at the bottom of the page, click Run. If successful, you will see a green check mark on

each module.

Right-click on the module “Execute R Script”, select “R Device”, and click “Visualize. You should see the

following visual:

The plot is an example of how anomalies can be identified using Support Vector Machines (SVMs). From

the plot, we see that the circled data points are considered abnormalities in the dataset.

Page 50: 2017 OSIsoft TechCon Apply Data Science and Machine Learning … · 2019-07-12 · At TechCon 2016, we reviewed an end-to-end use case for developing a machine learning (multivariate

Apply Data Science and Machine Learning to PI System Data for Predictive Analytics

50 | P a g e

In this lab, the SVM algorithm is given a set of training examples (as previously defined by 70% of the

original dataset) and it determines a decision function for whether data belongs to one class, which is

the “normal” class or is an “anomaly”

You can tune the model by adjusting the split fraction for the train and test set, along with adjusting the

nu (η) and epsilon (ε) parameters.

After you are satisfied with the model, you can save the “Trained Model with a right-click on “Train

Anomaly Detection…”

11. The “Trained Model” can now be deployed as a web service for “scoring” new data. To deploy

as a web service, please do the following (the modules are shown in the blue ellipse):

Page 51: 2017 OSIsoft TechCon Apply Data Science and Machine Learning … · 2019-07-12 · At TechCon 2016, we reviewed an end-to-end use case for developing a machine learning (multivariate

Part III – SVM – Support Vector Machine

51 | P a g e

▪ Add a “Web Service Input”

▪ Add the “Trained Model”

▪ Add a “Score Model”

▪ Add a “Web Service Output”

▪ Connect the modules as shown above.

▪ Run the Experiment

Lastly, use the “Deploy Web Service [Classic]” to deploy the web service.

After it is deployed, you will see it listed under “WEB SERVICES”

This web service is used in the next Part to illustrate its use with scoring i.e. assigning FAULT/NORMAL

status to new real-time data from the AHU operations.

Page 52: 2017 OSIsoft TechCon Apply Data Science and Machine Learning … · 2019-07-12 · At TechCon 2016, we reviewed an end-to-end use case for developing a machine learning (multivariate

Apply Data Science and Machine Learning to PI System Data for Predictive Analytics

52 | P a g e

Part IV – Deploying the Model and Interactive Display

The “WEB SERVICE” will bring you to a page where you can input experimental values and calculate a

prediction. Click on “Test”.

1. Enter appropriate values for each variable and click on the check icon on the bottom right

corner.

2. The predicted result should appear at the bottom of the screen – the last 2 numerical values are

“Scored Label” and “Scored Probabilities”.

Page 53: 2017 OSIsoft TechCon Apply Data Science and Machine Learning … · 2019-07-12 · At TechCon 2016, we reviewed an end-to-end use case for developing a machine learning (multivariate

Part IV – Deploying the Model and Interactive Display

53 | P a g e

In the section below, we will review Windows PowerShell scripting to invoke AzureML web services.

Windows PowerShell Scripting

The steps in this section include:

• Review and test the PowerShell scripts to call Azure ML web services to calculate AHU Fault/No

Fault state and write it to the PI System

• Use a suitable display to review AHU operational data and its “scored” Fault status

It is assumed that you are familiar with PowerShell and the PowerShell Script Editor environment.

However no PowerShell scripting experience is necessary to run these scripts.

For reference, please see:

https://docs.microsoft.com/en-us/azure/machine-learning/machine-learning-consume-web-services

Azure ML web service details

Earlier, you saw how to use the “Test” button to manually invoke the web service.

To invoke the web services programmatically, use the “REQUEST/RESPONSE” link for code snippets.

Page 54: 2017 OSIsoft TechCon Apply Data Science and Machine Learning … · 2019-07-12 · At TechCon 2016, we reviewed an end-to-end use case for developing a machine learning (multivariate

Apply Data Science and Machine Learning to PI System Data for Predictive Analytics

54 | P a g e

PowerShell scripts

The code snippets from the REQUEST/RESPONSE example are incorporated in to the PowerShell scripts.

The scripts provided (see the lab folder) are:

Use the icon shown below to launch the PowerShell script editor to review the scripts.

Page 55: 2017 OSIsoft TechCon Apply Data Science and Machine Learning … · 2019-07-12 · At TechCon 2016, we reviewed an end-to-end use case for developing a machine learning (multivariate

Part IV – Deploying the Model and Interactive Display

55 | P a g e

A sample code snippet is as below:

Testing AHU Fault Status prediction using PowerShell script

Test_PI_Predict_AHUFault.ps1 allows you to invoke the AzureML web service from the desktop.

Note: Only the scripts with a filename starting with Test… can be invoked from the desktop.

Remaining script files (.PS1 files) are function calls and only intended to be programmatically called from

other scripts.

Right-click on the file name Test_PI_Predict_AHUFault; select <Run with PowerShell> from the menu.

Page 56: 2017 OSIsoft TechCon Apply Data Science and Machine Learning … · 2019-07-12 · At TechCon 2016, we reviewed an end-to-end use case for developing a machine learning (multivariate

Apply Data Science and Machine Learning to PI System Data for Predictive Analytics

56 | P a g e

You will be prompted for several inputs as shown below. This call may take a few seconds to run.

If you see “Execution Policy Change…”, enter “Y” and continue.

Windows Task Scheduler

A background task has been configured via Windows Task Scheduler. It uses Main_AzureML.ps1 script

and runs every 10 mins:

▪ PI_GetAFValue.ps1

Retrieves the current AHU operations data from PI (using PI WebAPI) for the parameters (i.e.

Mixed Air Output, Mixed Air Temperature etc.) to be passed to AzureML web service

▪ PI_Predict_AHUFault.ps1

Calls the AzureML web service to calculate Fault/No-Fault state

▪ PI_PutAFValue.ps1

Writes the “returned value”, i.e. the Active Power (AF attribute “AHU_FaultCalc”) to PI System

Page 57: 2017 OSIsoft TechCon Apply Data Science and Machine Learning … · 2019-07-12 · At TechCon 2016, we reviewed an end-to-end use case for developing a machine learning (multivariate

Part IV – Deploying the Model and Interactive Display

57 | P a g e

PI Integrator – Continuous Publishing

PI Integrator is used for continuous publishing of daily operations data (every 10 mins in this

lab) and the output from the Integrator which includes the AHU Fault/No Fault state

predicted by AzureML is used by the interactive web page in Shiny (see next Section) to

refresh its display and show anomalies, if any.

PI Integrator “Scored1” view has been configured to write the relevant AHU values for the

current day to a text file every 10 minutes. The text file is updated with new values every 10

minutes.

Please use the instructions from Part I as a guide to configure PI Integrator for continuous publishing.

You can copy the Asset View configured in Part I into another View, and change the publication schedule

to every 10mins for “t” to ‘*” – this allows daily AHU operations data to be exported.

In the next section, we will review the development and use of an interactive web display to visualize

AHU operational states and predicted anomalies, if any.

Page 58: 2017 OSIsoft TechCon Apply Data Science and Machine Learning … · 2019-07-12 · At TechCon 2016, we reviewed an end-to-end use case for developing a machine learning (multivariate

Apply Data Science and Machine Learning to PI System Data for Predictive Analytics

58 | P a g e

Shiny – by RStudio – a Web Application Framework for R

https://shiny.rstudio.com/

This section illustrates how we reuse the plots developed in Part II for an interactive web-based display

to visualize and troubleshoot AHU operations data.

Double-click on server.R to open in R Studio.

In R Studio, with the server.R file open, click on Run App. It will bring up the following interactive web

page.

Page 59: 2017 OSIsoft TechCon Apply Data Science and Machine Learning … · 2019-07-12 · At TechCon 2016, we reviewed an end-to-end use case for developing a machine learning (multivariate

Part IV – Deploying the Model and Interactive Display

59 | P a g e

Air Handler Unit - Operations display – each point or bubble shows hourly AHU “state” for 2016

➢ x-axis=principal component 1, y-axis= principal component 2

➢ Size of the bubble corresponds to the Count of an operational state, i.e. how often is the AHU at

an operational state

➢ Color of the bubble indicates outside air temperature (OAT) - ranges from 45 ºF to 110 ºF. Colder

days (orange bubbles) are on the far right and towards the top and hotter days (pink bubbles)

are on the far left and towards the top

➢ Select OAT (outside air temperature) to view all AHU operations at a specified value; in the plot

OAT is set to 100 ºF and the corresponding operating states are shown as black circles

➢ Select SA (supply air flow, cfm) to view all AHU operations at specified value; black circles show

for the selected value (this is not shown in the plot)

➢ Select a region using a brushing action (blue shaded rectangle in the plot) to see corresponding

AHU operations data in the Table. The plot shows an outlier i.e. extreme operating point in the

blue shaded rectangle – and the details are shown in the table as 5/18/2016 17:00 data

➢ Select a date – the black dotted line shows the progression of the AHU operational state (8am to

8pm) in the direction of the arrow. Detailed hour-by-hour operations data is displayed in the

table below the plot

Anomalies or faulted AHU states, if any, are indicated with red circles

Click on “Show today’s operation” to overlay “current day’s operations” – along with anomalies, if any,

on the web page.

Page 60: 2017 OSIsoft TechCon Apply Data Science and Machine Learning … · 2019-07-12 · At TechCon 2016, we reviewed an end-to-end use case for developing a machine learning (multivariate

Apply Data Science and Machine Learning to PI System Data for Predictive Analytics

60 | P a g e

Shiny ui.R and server.R code snippets are shown below:

library(shiny) ui=fluidPage( checkboxInput("saON", "Filter by SA", FALSE), conditionalPanel( condition = "input.saON == 1", sliderInput("sa","Supply Air cfm",min=1000,max=19000,value=4000,step=1000)), checkboxInput("oatON", "Flter by OAT", FALSE), conditionalPanel( condition = "input.oatON == 1", sliderInput("oat","OAT",min=50,max=110,value=60,step=5)), checkboxInput("dateON", "Flter by Date", FALSE), conditionalPanel( condition = "input.dateON == 1", textInput("date", "Flter by Date", "03/01/2016")), plotOutput("ahuops",brush="ahuops_brush"), dataTableOutput("ahubrushrows"), dataTableOutput("ahudayrows") )

library(shiny) library (ggplot2) server=function(input,output,session){ c1=read.csv("ahu03s.csv", h=T, sep=',') c1$pc1a=round(c1$pc1*10)/10 c1$pc2a=round(c1$pc2*10)/10 c1$date=strptime(c1$DateTime,"%m/%d/%Y") output$ahuops=renderPlot({ c2=c1 if (input$oatON & !(input$saON)) c2=subset(c1,c1$oa==input$oat) if (!(input$oatON) & (input$saON)) c2=subset(c1,c1$sa==input$sa) if ((input$oatON) & (input$saON)) c2=subset(c1,c1$sa==input$sa & c1$oa==input$oat) oat=as.factor(c1$oa) p=ggplot(data=c1)+aes(x=pc1a,y=pc2a,color=oat) + geom_count() if ((input$oatON) | (input$saON)) p=p+ geom_point(data=c2, color='black',size=4,shape=1) if (input$dateON) p=p+geom_path(data=subset(c1,c1$date==strptime(input$date,"%m/%d/%Y")),color='black',linetype=2,size=1,arrow = arrow(length = unit(0.5, "cm"),type="closed")) p }) output$ahubrushrows=renderDataTable({ #if (is.null(input$ahuops_brush$x)) return()

Page 61: 2017 OSIsoft TechCon Apply Data Science and Machine Learning … · 2019-07-12 · At TechCon 2016, we reviewed an end-to-end use case for developing a machine learning (multivariate

Part IV – Deploying the Model and Interactive Display

61 | P a g e

#else subset(brushedPoints(c1,input$ahuops_brush,xvar="pc1a", yvar="pc2a"),select=-c(oa,sa,pc1,pc2,pc1a,pc2a,date)) }) output$ahudayrows=renderDataTable({ c2=subset(c1, c1$date==strptime(input$date,"%m/%d/%Y")) c2=subset(c2,select=-c(oa,sa,pc1,pc2,pc1a,pc2a,date)) c2 }) }

This concludes the hands-on portion of the lab.

Appendix A contains code samples for using PI WebAPI from within the R environment.

References include PI System customer examples of machine learning.

In summary, in this lab, we covered the following learning objectives:

▪ Extracting data from the PI System using PI Integrator

▪ Using the PI System data in R, data cleansing, feature selection, model development etc.

▪ Using the PCA (principal component) model with Shiny https://shiny.rstudio.com/ to create an

interactive display for visual analytics; also using SVM (support vector machine) for classification

and prediction of AHU fault/no-fault state

▪ Using Azure ML with PI System data

▪ Deploying the machine learning model for continuous execution with real-time data

▪ Understanding the end-to-end data science process – data retrieval, data cleansing, shaping and

preparation with meta-data context, feature selection via domain specific guidelines, applying

machine learning methods, visualizing the results and operationalizing the findings

Page 62: 2017 OSIsoft TechCon Apply Data Science and Machine Learning … · 2019-07-12 · At TechCon 2016, we reviewed an end-to-end use case for developing a machine learning (multivariate

Apply Data Science and Machine Learning to PI System Data for Predictive Analytics

62 | P a g e

Appendix A: Using PI Web API with R

If you prefer to stay within the R environment, you can read/write PI System data using PI WebAPI

which is available as part of the PI Developer technologies.

https://livelibrary.osisoft.com/LiveLibrary/web/ui.xql?action=html&resource=publist_home.html&filter

=developer

For R code samples, please refer to the files with “piwebapi” prefix in their names.

The PI WebAPI approach is particularly useful when the predictive machine learning models are complex

– such as those based on decision trees, neural net etc. and hence cannot be deployed using PI AF

Analysis.

Page 63: 2017 OSIsoft TechCon Apply Data Science and Machine Learning … · 2019-07-12 · At TechCon 2016, we reviewed an end-to-end use case for developing a machine learning (multivariate

Reference Materials

63 | P a g e

Reference Materials

https://cran.r-project.org/

https://shiny.rstudio.com/

OSIsoft TechCon 2016 Lab Notes Use Data Science for Machine Learning and Predictions Based on Your

PI System Data

http://stackoverflow.com/questions/22309236/options-for-deploying-r-models-in-production

https://docs.microsoft.com/en-us/azure/machine-learning/machine-learning-consume-web-services

Customer examples:

Deschutes Brewery: Reducing Beer Production Time with Predictions

Devon Energy: The Importance of a Real-Time Data Infrastructure for IIOT, Advanced Analytics, and Big

Data

UC 2017 – Juliette Spinnato, Total

Real-time Estimations and Online Learning for Industrial Assets at Total

Since 2015, one of Total’s ambitions has been to promote a strong data culture among its collaborators to move towards a data-

driven and a digitalized industry. In that context, the Refining and Chemical branch has built a data analytics center to deal with

industrial data science matters. Using new data science methodology, tools and infrastructure, machine learning models have been implemented to provide the Business with real-time estimations of various parameters for our industrial assets. In this presentation

we will focus on the estimation of the percentage of gasoil remaining in the residue of a Crude Distillation Unit (CDU). Based on

conditional parameters (flow, temperature, pressure …), a data-driven model has been built in R and Python and implemented in the refinery’s PI System. In order to anticipate possible model degradation over time, an online retraining has also been developed.

In this presentation, the main focus will be to present the technical frame to develop such a workflow: needed infrastructure, data

science methodology and algorithms, useful tools and software. In order to get the best out of this presentation, attendees are expected to be familiar with the general framework of a data-driven project without necessarily being an expert in data science.