What's New in the Timeseries Toolkit for IBM InfoSphere Streams V4.0

25
© 2015 IBM Corporation Timeseries Toolkit – What’s New! IBM InfoSphere Streams Version 4.0 James Cancilla Streams Toolkit Developer For questions about this presentation contact James Cancilla - [email protected]

Transcript of What's New in the Timeseries Toolkit for IBM InfoSphere Streams V4.0

Page 1: What's New in the Timeseries Toolkit for IBM InfoSphere Streams V4.0

© 2015 IBM Corporation

Timeseries Toolkit – What’s New!

IBM InfoSphere Streams Version 4.0

James Cancilla

Streams Toolkit Developer

For questions about this presentation contact James Cancilla -

[email protected]

Page 2: What's New in the Timeseries Toolkit for IBM InfoSphere Streams V4.0

2 © 2015 IBM Corporation

Important Disclaimer

THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONALPURPOSES ONLY.

WHILE EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THEINFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED “AS IS”, WITHOUT WARRANTYOF ANY KIND, EXPRESS OR IMPLIED.

IN ADDITION, THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY,WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE.

IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OROTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION.

NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, OR SHALL HAVE THE EFFECT OF:

• CREATING ANY WARRANTY OR REPRESENTATION FROM IBM (OR ITS AFFILIATES OR ITS ORTHEIR SUPPLIERS AND/OR LICENSORS); OR

• ALTERING THE TERMS AND CONDITIONS OF THE APPLICABLE LICENSE AGREEMENTGOVERNING THE USE OF IBM SOFTWARE.

IBM’s statements regarding its plans, directions, and intent are subject to change orwithdrawal without notice at IBM’s sole discretion. Information regarding potentialfuture products is intended to outline our general product direction and it should notbe relied on in making a purchasing decision. The information mentioned regardingpotential future products is not a commitment, promise, or legal obligation to deliverany material, code or functionality. Information about potential future products maynot be incorporated into any contract. The development, release, and timing of anyfuture features or functionality described for our products remains at our solediscretion.

THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE.

IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION.

Page 3: What's New in the Timeseries Toolkit for IBM InfoSphere Streams V4.0

3 © 2015 IBM Corporation

Agenda

What’s New!

AnomalyDetector Operator

KMeansClustering Operator

DSPFilterFinite Operator

Page 4: What's New in the Timeseries Toolkit for IBM InfoSphere Streams V4.0

4 © 2015 IBM Corporation

Timeseries – What’s New!

9 New operators:– AnomalyDetector

– AutoForecaster2

– CrossCorrelate2

– CrossCorrelateMulti

– DSPFilter2

– DSPFilterFinite

– DWT2

– KMeansClustering

– VAR2

7 New Functions:– laggedCrosscorrelate()

– laggedConvolve

– dtw()

– dtw_itakura()

– dtw_sakoe_chiba()

– lcss()

– lpNorm()

Page 5: What's New in the Timeseries Toolkit for IBM InfoSphere Streams V4.0

5 © 2015 IBM Corporation

AnomalyDetector Operator

Capable of performing

online anomaly detection of

a time series

Detects anomalous

subsequences

Change the sensitivity of the

anomalies detected

Page 6: What's New in the Timeseries Toolkit for IBM InfoSphere Streams V4.0

6 © 2015 IBM Corporation

AnomalyDetector Operator

Many industries need to be able to detect anomalies as they occur in real-

time

Energy & Utility Natural Resource

Health Care Network Intrusion

Page 7: What's New in the Timeseries Toolkit for IBM InfoSphere Streams V4.0

7 © 2015 IBM Corporation

AnomalyDetector Operator

How it works:

Assume the following graph:

Page 8: What's New in the Timeseries Toolkit for IBM InfoSphere Streams V4.0

8 © 2015 IBM Corporation

AnomalyDetector Operator

How it works:

1. As data arrives it is saved in memoryWe will refer to this as the “reference pattern”

2. Also as data arrives, we will save another pattern in memoryWe will refer to this as the “current pattern”

Page 9: What's New in the Timeseries Toolkit for IBM InfoSphere Streams V4.0

9 © 2015 IBM Corporation

AnomalyDetector Operator

How it works:

3. Each time a tuple arrives, the current pattern is updated and is

then compared against a subsequence of the reference pattern

4. The compare operation will generate a score

Score = X1X1X2X3X4X5

Page 10: What's New in the Timeseries Toolkit for IBM InfoSphere Streams V4.0

10 © 2015 IBM Corporation

AnomalyDetector Operator

How it works:

5. A final score is calculated from each comparison with the

subsequence

6. If the score is above the confidence value specified in the

operator, an output tuple will be generated containing the current

(anomalous) pattern

Page 11: What's New in the Timeseries Toolkit for IBM InfoSphere Streams V4.0

11 © 2015 IBM Corporation

AnomalyDetector Operator

Parameters:

Output Functions:

Parameter Name Description

inputTimeseries Specifies the input attribute containing the time series data

inputTimestamp Specifies the input attribute containing timestamp data

patternLength Specifies the length of the ‘current pattern’

referenceLength The number of tuples to store as part of the ‘reference pattern’

patternCount The number of subsequence patterns that the current pattern will be compared against

stepSize Specifies how many steps the sliding window will shift (default value is 1)

confidence Limits the output to only those sequences that have a score equal to or greater than the specified value

Output function Description

getSubsequence() Returns a list<float64> that contains the anomalous pattern.

getScore() Returns the calculated score of the anomalous pattern.

getStartTime() Returns the start time of the anomalous pattern

getEndTime() Returns the end time of the anomalous pattern

Page 12: What's New in the Timeseries Toolkit for IBM InfoSphere Streams V4.0

12 © 2015 IBM Corporation

AnomalyDetector Operator

Additional Information

AnomalyDetector Operator – Info Center Page– http://www-

01.ibm.com/support/knowledgecenter/SSCRJU_4.0.0/com.ibm.streams.toolkits.doc/doc

/tk$com.ibm.streams.timeseries/op$com.ibm.streams.timeseries.analysis$AnomalyDete

ctor.html

Page 13: What's New in the Timeseries Toolkit for IBM InfoSphere Streams V4.0

13 © 2015 IBM Corporation

KMeansClustering Operator

Clustering analysis is a popular

technique used to find natural

grouping of a set of objects

Cluster analysis is useful in

multiple fields such as biology,

medicine, business and social

media– In medicine, cluster analysis may be

used to distinguish between different

types of blood and tissue samples

– In social media, cluster analysis can

be used to distinguish between

different groups within large

communities

Page 14: What's New in the Timeseries Toolkit for IBM InfoSphere Streams V4.0

14 © 2015 IBM Corporation

The KMeansClustering operator uses

the K-Means algorithm to find groups

within a set of data

Summary of K-Means algorithm:

1. Determine how many clusters you

want to find

2. Randomly define a mean value for

each cluster

3. Using a set of training data, determine

which mean value each data point

is closest to

4. Once all of the data points have been assigned to a mean, recalculate the position of the

mean values

5. Repeat steps 3 & 4 until the mean values no longer move

KMeansClustering Operator

Page 15: What's New in the Timeseries Toolkit for IBM InfoSphere Streams V4.0

15 © 2015 IBM Corporation

KMeansClustering Operator

How the operator works:

sample1

sample2

sample3

sample4

.

.

.

sampleN

sampleN+1

.

.

.

Incoming data

generated k-means model

initial samples

sampleN,<cluster#>

sampleN+1,<cluster#>

Page 16: What's New in the Timeseries Toolkit for IBM InfoSphere Streams V4.0

16 © 2015 IBM Corporation

KMeansClustering Operator

Inputs

The KMeansClustering operator can accept data in two formats:

– As a list<float64>, where each tuple represents a single data point with

multiple dimensions• For example, a list with a value [10, 20] may represent the x- and y-coordinates of a

single data point

– A single float64 value• In this case, the operator must be configured with a window that has a fixed size

• The size of the window represents the number of dimensions in a single data-point

Page 17: What's New in the Timeseries Toolkit for IBM InfoSphere Streams V4.0

17 © 2015 IBM Corporation

KMeansClustering Operator

Parameters

Output Functions

Output function Description

getDataPoint() Returns the data point that was scored against the cluster

getClusterIndex() Returns the index of the cluster that the data point was assigned to

getClusterMean() Returns the mean of the cluster that the data point was assigned to

getClusterVariance() Returns the variance of the cluster that the data point was assigned to

getClusterLabel() Returns the label of the cluster that the data point was assigned to

Parameter Name Description

initSamples Specifies the initial number of tuples to use to build the cluster

clusters Specifies the number of clusters to generate

inputData Specifies the attribute that contains the data points

initMeans Specifies the initial set of cluster means

seed Specifies the seed value to use when randomly generating the initial mean values

clusterLabels Allows for setting the labels to use for each of the clusters

Page 18: What's New in the Timeseries Toolkit for IBM InfoSphere Streams V4.0

18 © 2015 IBM Corporation

KMeansClustering Operator

Additional Information

KMeansClustering Operator – Info Center Page– http://www-

01.ibm.com/support/knowledgecenter/SSCRJU_4.0.0/com.ibm.streams.toolkits.doc/doc

/tk$com.ibm.streams.timeseries/op$com.ibm.streams.timeseries.modeling$KMeansClu

stering.html

Page 19: What's New in the Timeseries Toolkit for IBM InfoSphere Streams V4.0

19 © 2015 IBM Corporation

DSPFilterFinite Operator

Unlike the DSPFilter operator, the DSPFilterFinite operator operates on

signal segments

The operator ingests a complete time series signal and filters it

There are many applications where segments of a signal need to be filtered– For example, a call center may want to filter out the noise of phone calls prior to analysis

• Each phone call from each of the call center employees can be considered a finite-length time

series signal

Page 20: What's New in the Timeseries Toolkit for IBM InfoSphere Streams V4.0

20 © 2015 IBM Corporation

DSPFilterFinite Operator

DSPFilterFinite operator has the ability to set the filter parameters on a per-

tuple basis

Each incoming time series segment can be filtered using a different set of

filter parameters

Allows for a real-time, dynamic filter bank– “a filter bank is an array of band-pass filters that separates the input signal into multiple

components, each one carrying a single frequency sub-band of the original signal.” (Filter

bank – Wikipedia)

Page 21: What's New in the Timeseries Toolkit for IBM InfoSphere Streams V4.0

21 © 2015 IBM Corporation

DSPFilterFinite Operator

Page 22: What's New in the Timeseries Toolkit for IBM InfoSphere Streams V4.0

22 © 2015 IBM Corporation

DSPFilterFinite Operator

Page 23: What's New in the Timeseries Toolkit for IBM InfoSphere Streams V4.0

23 © 2015 IBM Corporation

DSPFilterFinite Operator

Parameters

Outputs

Parameter Name Description

inputTimeSeries Specifies the input attribute containing the signal segment

filterType Specifies the type of filter to apply (lowPass or highPass)

samplingRate Specifies the sampling rate

cutOffFrequency Specifies the cut off frequency

xcoef Allows for specifying the x-coefficients of the butterworth filter

ycoef Allows for specifying the y-coefficients of the butterworth filter

coefParameterFile Allows for specifying a file containing the x- and y-coefficients of the butterworth filter

Output function Description

filteredTimeSeries() Returns the filtered time series

getInputTimeSeries() Returns the input time series (useful when using windowing)

Page 24: What's New in the Timeseries Toolkit for IBM InfoSphere Streams V4.0

24 © 2015 IBM Corporation

DSPFilterFinite Operator

Additional Information

DSPFilterFinite Operator – Info Center Page– http://www-

01.ibm.com/support/knowledgecenter/SSCRJU_4.0.0/com.ibm.streams.toolkits.doc/doc

/tk$com.ibm.streams.timeseries/op$com.ibm.streams.timeseries.analysis$DSPFilterFini

te.html

Page 25: What's New in the Timeseries Toolkit for IBM InfoSphere Streams V4.0

25 © 2015 IBM Corporation

Questions?