SPL Enhancements InfoSphere Streams Version 3.0

74
© 2012 IBM Corporation 1 SPL Enhancements InfoSphere Streams Version 3.0 Howard Nasgaard SPL Compiler, SPL Runtime & Standard Toolkit Development

description

Howard Nasgaard SPL Compiler, SPL Runtime & Standard Toolkit Development. SPL Enhancements InfoSphere Streams Version 3.0. Important Disclaimer. THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY. - PowerPoint PPT Presentation

Transcript of SPL Enhancements InfoSphere Streams Version 3.0

Page 1: SPL Enhancements  InfoSphere Streams Version 3.0

© 2012 IBM Corporation1

SPL Enhancements

InfoSphere Streams Version 3.0

Howard NasgaardSPL Compiler, SPL Runtime & Standard Toolkit Development

Page 2: SPL Enhancements  InfoSphere Streams Version 3.0

© 2012 IBM Corporation2

Important Disclaimer

THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY.

WHILE EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE INFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED.

IN ADDITION, THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE.

IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION.

NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, OR SHALL HAVE THE EFFECT OF:

• CREATING ANY WARRANTY OR REPRESENTATION FROM IBM (OR ITS AFFILIATES OR ITS OR THEIR SUPPLIERS AND/OR LICENSORS); OR

• ALTERING THE TERMS AND CONDITIONS OF THE APPLICABLE LICENSE AGREEMENT GOVERNING THE USE OF IBM SOFTWARE.

The information on the new product is intended to outline our general product direction and it should not be relied on in making a purchasing decision. The information on the new product is for informational purposes only and may not be incorporated into any contract. The information on the new product is not a commitment, promise, or legal obligation to deliver any material, code or functionality. The development, release, and timing of any features or functionality described for our products remains at our sole discretion.

THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE. IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION.

Page 3: SPL Enhancements  InfoSphere Streams Version 3.0

© 2012 IBM Corporation3

Agenda

Walk-through of new SPL and Standard Toolkit changes and additions

Page 4: SPL Enhancements  InfoSphere Streams Version 3.0

© 2012 IBM Corporation4

Problem

How do I ingest and work on XML data in a streams application?

Page 5: SPL Enhancements  InfoSphere Streams Version 3.0

© 2012 IBM Corporation5

XML Support

‘xml’ added as a first-class datatype– stream<xml x> ....– Checked for form

Can also specify a schema– stream<xml<“mySchema”> x>– Checked for form and validity

Schema can be file or web-based URI– Recommend local file

• data directory root if relative

Page 6: SPL Enhancements  InfoSphere Streams Version 3.0

© 2012 IBM Corporation6

XML Support - Conversion

rstring <-> xml– xml x = (xml) “<doc>...</doc>”r;

• Checked for form– xml<“schema”> xs = (xml<“schema”>)“<doc>...</doc>”r;

• Checked for form and validity– rstring s = (rstring)x;

Form and validity checking done only when needed• xml x = (xml)xs;

– Not checked• xs = (xml<“schema”>)x

– Checked for validity

Validation failure at runtime will throw an exception Set of built-in functions available to convert to xml

– Can be used with return code in logic

Page 7: SPL Enhancements  InfoSphere Streams Version 3.0

© 2012 IBM Corporation7

XML Support - Conversion

xml <-> tuple– type T = tuple<int32 i, rstring s>;– mutable T t = {....};– mutable xml x = (xml)t;– Converted to xml in “Serialized Tuple Model” format

• Schema provided with Streams (serializedTupleModel.xsd)– mutable T t = (T)x;

• Validated against tuple model schema

No conversion from ustring to xml directly– Must go through rstring

Page 8: SPL Enhancements  InfoSphere Streams Version 3.0

© 2012 IBM Corporation8

XML Literals

New literal type added for XML– “<a b=\”hi\”>x</a>”x;

String syntax extended to ease use in XML literals– ‘ (single quote) can now be used to delineate strings

• ‘<a b=”hi”>x</a>’x;– Embedded new-lines are now legal in string literals

• ‘<a> <b>x</b> </a>’x;

– Can be used in all string literals

Page 9: SPL Enhancements  InfoSphere Streams Version 3.0

© 2012 IBM Corporation9

XML Support - Encoding

XML literals are assumed to be in UTF-8 encoding– Source files are in UTF-8 so any explicit encoding must be too– ‘<?xml ... encoding=xxxxx”> ...’x;

• Compile-time error if xxxxx is not UTF-8 rstring expressions that contain XML data...

– Are assumed to be encoded in UTF-8 if no encoding specified– Must contain valid characters if encoding is specified

• Error raised at cast time if not

Page 10: SPL Enhancements  InfoSphere Streams Version 3.0

© 2012 IBM Corporation10

XML Support – Source/Sink Operators

Source operators can read attributes of xml type– stream<xml x> In = FileSource() {...}– xml is checked for form and validated (if there is a schema)

Sink operators can write attributes of xml type– () as Out = FileSink(stream<xml x> In) {...}

csv, txt as quoted xml literals bin in serialized form

• No validation (assumed valid) Source can read “traditional” xml file in line or block

– Requires XMLParse operator to be useful– No validation in Source operators

Sink operators cannot write XML using line or block format– XML must first be converted to rstring or blob.

Page 11: SPL Enhancements  InfoSphere Streams Version 3.0

© 2012 IBM Corporation11

XML Support - XMLParse Operator

Converts xml data to tuples Input attribute can be rstring, ustring, blob, xml Input data can be in multiple lines/blocks with rstring, ustring

or blob– “line” format can be used to read “traditional” XML files– If ustring any encoding directive is ignored– Operator validates xml

Input in rstring, ustring or blob can contain multiple, sequential, XML documents

A window marker punctuation is generated at the end of each XML document

XMLParse will not produce attributes of xml type

Page 12: SPL Enhancements  InfoSphere Streams Version 3.0

© 2012 IBM Corporation12

XML Support – XMLParse Operator

Generates one or more output stream Each stream

– Corresponds to a subtree within the XML (ie: element)– Requires a “trigger” expression

Trigger expression– rstring containing an XPath expression that defines a node set– Tuples are generated for each node in the node set– Must start at the root of the document

• param trigger : “/doc/a”; Two mechanisms to specify the mapping of XML to tuple

content– Implicit: content derived from the output stream schema– Explicit: content specified in the output clauses

Page 13: SPL Enhancements  InfoSphere Streams Version 3.0

© 2012 IBM Corporation13

XML Support – Implicitly deriving tuple content

The tuple schema representation of an XML element is:– type Element = tuple<map<rstring, rstring> _attrs, rstring _text [,

NestedTuples]*> – _attrs contains all the attribute name/value pairs– _text contains the text content between the open/close tag– Additional tuples or lists of tuples represent nested elements

Defining an output stream schema that follows this notion allows the XMLParse operator to generate a SAX parser that will extract the desired information

An example:

Page 14: SPL Enhancements  InfoSphere Streams Version 3.0

© 2012 IBM Corporation14

XML Support – XMLParse example

Things to note:– The nested tuples for sub-elements ‘d’ and ‘e’ do not have a map for

attributes. Not needed.– The trigger expression “/a” always starts with ‘/’

<a b="1" c="vc1"> va1 <d>vd1</d> <e>ve1a</e> <e>ve1b</e></a>

type aElem = tuple<map<rstring,rstring> _attrs, rstring _text, tuple<rstring _text> d, list<tuple<rstring _text>> e>;

stream<aElem> O = XMLParse(…) { param trigger : “/a”; }

Page 15: SPL Enhancements  InfoSphere Streams Version 3.0

© 2012 IBM Corporation15

XML Support – Another XMLParse example

Things to note:– Map has been replaced by scalar b and list<scalar>[1] c

• param flatten : attributes• Put XML attributes in spl [list]scalar attributes of the same name

– SPL attribute b has type int32• Everything in XML is considered rstring by default• Specifying a non-rstring type causes a conversion

<a b="1" c="vc1"> va1 <d>vd1</d> <e>ve1a</e> <e>ve1b</e></a>

type aElem = tuple<int32 b, list<rstring>[1] c, rstring _text, tuple<rstring _text> d, list<tuple<rstring _text>> e>;

stream<aElem> O = XMLParse(…) { param trigger : “/a”; flatten : attributes; }

Page 16: SPL Enhancements  InfoSphere Streams Version 3.0

© 2012 IBM Corporation16

XML Support – Still another XMLParse example

Things to note:– The nested tuple<rstring _text> d is reduced to rstring d

• rstring SPL attributes not named _text are assumed to refer to the text content of a nested element by that name

– The list<tuple<rstring _text>> e is reduced to list<rstring> e– The map is back? Why?

<a b="1" c="vc1"> va1 <d>vd1</d> <e>ve1a</e> <e>ve1b</e></a>

type aElem = tuple<map<rstring,rstring> _attrs, rstring _text, rstring d, list<rstring> e>;

stream<aElem> O = XMLParse(…) { param trigger : “/a”; flatten : elements; }

Page 17: SPL Enhancements  InfoSphere Streams Version 3.0

© 2012 IBM Corporation17

XML Support – Implicitly deriving tuple content

Reduction of maps/tuples to scalars is referred to as flattening Reduction of maps/tuples to scalars can only be done for XML

attributes OR elements, not both– rstring b could mean element b or attribute b. – You must tell the XMLParse operator which one you want

• param flatten : attributes/elements/none (default none) XML attribute or element content not represented in the tuple

schema will be ignored– You do not need to fully represent the XML structure in the schema

My XML just happens to have an element named _text.– params textName and AttributeName can change the default values of

_text and _attrs.

Page 18: SPL Enhancements  InfoSphere Streams Version 3.0

© 2012 IBM Corporation18

XML Support – Explicitly specifying tuple content

As with other operators, expressions in the output clause assign values to the output tuple attributes

Expressions use custom output functions to specify the mapping of XML data to SPL attribute– rstring XPath(rstring xpathExpn)– <tuple T> XPath (rstring xpathExpn, T tupleLiteral)– list<rstring> XPathList(rstring xPathExpn)– <any T> list<T> XPathList(rstring xpathExpn, T elements)– map<rstring,rstring> XPathMap (rstring xpathExpn)

Each of these functions require an XPath expression relative to the trigger expression, or the containing expression

Examples, please!

Page 19: SPL Enhancements  InfoSphere Streams Version 3.0

© 2012 IBM Corporation19

XML Support – Example of using explicit specification

Things to note:– Trigger expression says output a tuple for each “e” subtree– XPath expression “text()” specifies what to get from the ‘e’ subtree, the

‘e’ element’s text content in this case– Everything else in the XML is ignored– Two tuples would be output for the example XML– No naming convention for tuple attributes

<a b="1" c="vc1"> va1 <d>vd1</d> <e>ve1a</e> <e>ve1b</e></a>

stream<rstring s> O = XMLParse(...) {

param trigger : “/a/e”;

output O : s = XPath(“text()”); }

Page 20: SPL Enhancements  InfoSphere Streams Version 3.0

© 2012 IBM Corporation20

XML Support – Example of using explicit specification

Things to note:– Trigger expression says output a tuple for each “a” subtree– XPath expression “@b” specifies that we want the content of XML

attribute ‘b’– Must explicitly cast output of COFs if not rstring– One tuple would be output for the example XML

<a b="1" c="vc1"> va1 <d>vd1</d> <e>ve1a</e> <e>ve1b</e></a>

stream<int32 i> O = XMLParse(...) {

param trigger : “/a”;

output O : i = (int32)XPath(“@b”); }

Page 21: SPL Enhancements  InfoSphere Streams Version 3.0

© 2012 IBM Corporation21

XML Support – Example of using explicit specification

Things to note:– Trigger expression says output a tuple for each “a” subtree– XPath expression “e/text()” specifies that we want the content of XML

element ‘e’ and the XPathList function returns a list of all ‘e’ contents– One tuple would be output for the example XML with two values in list l

<a b="1" c="vc1"> va1 <d>vd1</d> <e>ve1a</e> <e>ve1b</e></a>

stream<list<rstring> l> O = XMLParse(...) {

param trigger : “/a”;

output O : l = XPathList(“e/text()”); }

Page 22: SPL Enhancements  InfoSphere Streams Version 3.0

© 2012 IBM Corporation22

XML Support – Example of using explicit specification

Things to note:– XPath expression “@*” specifies we want all attributes– XPathMap function returns the map containing all the attributes– One tuple will be output with a map containing two key/value pairs

<a b="1" c="vc1"> va1 <d>vd1</d> <e>ve1a</e> <e>ve1b</e></a>

stream<map<rstring, rstring> attrs> O = XMLParse(...) {

param trigger : “/a”;

output O : attrs = XPathMap(“@*”); }

Page 23: SPL Enhancements  InfoSphere Streams Version 3.0

© 2012 IBM Corporation23

XML Support – XMLParse Operator

Some other behavior– If an SPL attribute assignment does NOT contain XPath, XPathList or

XPathMap, then the expression will be resolved from the input stream– If an SPL attribute assignment is omitted, the XMLParse operator will

try to generate an implicit assignment using a default XPath or XPathList expression

– The parsing parameter controls its error processing:• strict: logs an error and terminates the operator• permissive: logs an error and continues

Page 24: SPL Enhancements  InfoSphere Streams Version 3.0

© 2012 IBM Corporation24

XML Support – spl-schema-from-xml utility

Given complex XML, crafting either tuple schemas for implicit generation, or output clauses for explicit generation, could be difficult

Enter the spl-schema-from-xml utility– Given a representative XML document it will:

• generate a set of typedefs for the tuple schema to support the full XML• optionally generate output clauses for each trigger specified• optionally generate a schema for the XML• optionally generate a composite operator wrapping the XMLParse operator• optionally generate a main composite with a source, sink and a call to the

parser composite– You can tailor the output from the utility to suit your needs– You can tell it to flatten elements or attributes

Page 25: SPL Enhancements  InfoSphere Streams Version 3.0

© 2012 IBM Corporation25

Sample spl-schema-from-xml output

<a b="1" c="vc1"> va1 <d>vd1</d> <e>ve1a</e> <e>ve1b</e></a>

spl-schema-from-xml -o a.spl -t '/a' --composite Parse --mainComposite Main data/test1.xml

Page 26: SPL Enhancements  InfoSphere Streams Version 3.0

© 2012 IBM Corporation26

Sample spl-schema-from-xml outputuse spl.XML::*;

composite Parse(input input0; output output0) {type static a_type = tuple<map<rstring, rstring> _attrs, rstring _text, a_d_type d, list<a_e_type> e>; static a_d_type = tuple<rstring _text>; static a_e_type = tuple<rstring _text>; graph stream<a_type> output0 = XMLParse(input0) { param trigger : "/a"; parsing : permissive; // log and ignore errors output output0 : _attrs = XPathMap("@*"), _text = XPath("text()"), d = XPath("d", {_text = XPath("text()")}), e = XPathList("e", {_text = XPath("text()")}); // *trigger: /a }}composite Main() { graph stream<rstring s> Input = FileSource() { param file : "test1.xml"; format : line; } stream<Parse.a_type> X0 = Parse(Input) { } () as O0 = FileSink(X0) { param file : "out0.dat"; }}

Page 27: SPL Enhancements  InfoSphere Streams Version 3.0

© 2012 IBM Corporation27

XML Support – Standard Library Support Functions

A number of new functions have been added to the library

Safe conversions from string to xml– <xml X, string T> void convertToXML(mutable X xmlResult, T input, mutable int32 error); – <xml X, string T> public bool convertToXML (mutable X xmlResult, T input);

An XQuery engine is added as an alternative to the XMLParse operator

– <xml X > public list<rstring> xquery (X input, rstring xqueryExpression); – <xml X > public list<rstring> xquery (X input, rstring xqueryExpression, mutable int32

error);– And numerous more flavors– All return a list of rstrings with the query results

Page 28: SPL Enhancements  InfoSphere Streams Version 3.0

© 2012 IBM Corporation28

XML Support – XQuery exampletype T = tuple<int32 id, tuple<rstring b, list<int32> x, float64 d> a, rstring c>;stream<T> OutTuples = Custom (Data) {

logic onTuple Data: {// extract string ‘c’mutable list<rstring> results = xquery(Data.xmlVar, “/something/bar/c/text()”);mutable rstring s = results[0];

// extract string ‘b’ attribute in ‘a’mutable tuple<rstring b, list<int32> x, float64 d> a = {};results = xquery(Data.xmlVar, “/something/bar/a/@bdata”);a.b = results[0];

// extract list<int32> ‘x’ attribute in ‘a’results = xquery(Data.xmlVar, “/something/bar/foo/text()”);for (rstring r in results)

appendM (a.x, (int32) r);

// extract float64 ‘d’ attribute in ‘a’results = xquery(Data.xmlVar, “/something/bar/a/d/text()”);a.d = (float64) results[0];// submit the final resultsubmit ({id = Data.id, a = a, c = s}, OutTuples);

}

Page 29: SPL Enhancements  InfoSphere Streams Version 3.0

© 2012 IBM Corporation29

XML Support – Database Toolkit

All database toolkit operators have been extended to support XML– XML converted to/from char data for DB that doesn’t support XML– DB2 PureXML capabilities are accessible if using DB2 V9.7 or later

Page 30: SPL Enhancements  InfoSphere Streams Version 3.0

© 2012 IBM Corporation30

Problem

How do I use a for statement to iterate over, and modify, a list?

– In SPL the for statement looks like

list<rstring> l = [...];for (rstring entry in l) { l = “????”;}

– This doesn’t work because entry is an rstring value, not an index into the list

– You need a list of indexes with the same number of entries as the list you are iterating overfor (int32 i in indexes) { l[i] = “????”;}

Page 31: SPL Enhancements  InfoSphere Streams Version 3.0

© 2012 IBM Corporation31

More Efficient ‘for’ Loops

Introduce a set of ‘range’ functions– // return [0, ..., limit-1] list<int32> range(int32 limit);

– // return [start, ..., limit-1]list<int32> range(int32 start, int32 limit);

– // return [start, start+step, ... number < limit]list<int32> range(int32 start, int32 limit, int32 step)

– // return [0, ..., size(l)-1]list<T> list<int32> range(T l)

Use:– mutable list<rstring> myList = [“hi”, “there”]; for (int32 i in range(myList)) { myList[i] = upper(myList[i]);}

Compiled into the C++ code when used inside a for loop

Page 32: SPL Enhancements  InfoSphere Streams Version 3.0

© 2012 IBM Corporation32

More Efficient ‘for’ loops

logic ....: { mutable list<rstring> myList = ["hi", "there"]; for (int32 i in range(myList)) { myList[i] = upper(myList[i]); println(myList[i]); } }

SPL::list<SPL::rstring > myList = ...; SPL::int32 temp = myList.size(); for (SPL::int32 i = 0; i < temp; i++) { myList.at(i) = ...::upper(myList.at(i)); ...::println(myList.at(i)); }

Page 33: SPL Enhancements  InfoSphere Streams Version 3.0

© 2012 IBM Corporation33

Other SPL Changes

SPADE to SPL translator removed– Must install Streams 2.x if you need translation

submit([tuple|punct], portNo) functions added– Enable dynamic port selection– Will raise an exception at runtime if port invalid

Return statement allowed in logic clause to enable simplification– Does not affect the normal processing of tuples in the generated

primitive operator. new Perl regex compatible functions added

– regexMatchPerl– regexReplacePerl– Both rstring and ustring varients

Page 34: SPL Enhancements  InfoSphere Streams Version 3.0

© 2012 IBM Corporation34

Problem

I want to write a primitive operator with custom output functions that can be nested within an output assignment– You saw this in the examples of the XMLParse operator

Page 35: SPL Enhancements  InfoSphere Streams Version 3.0

© 2012 IBM Corporation35

Operator Model Changes

Allow Custom Output Functions to be nested within an expression– Recall from XMLParse:

• output O : attr = XPathList(“...”, XPathList(...));– outputPortOpenSetType

• <allowNestedCustomOutputFunctions> true/false Allow Custom Output Functions to be used in a param

expression– parameterType

• <customOutputFunction> - name of a COF that can appear

Page 36: SPL Enhancements  InfoSphere Streams Version 3.0

© 2012 IBM Corporation36

Operator Model Changes

To support nested COFs and COFs in a param expression– Compiler will optionally generate an expression tree into the Operator

Instance Model (OIM)– APIs provided in the OIM interface to walk the expression tree and

query characteristics – APIs documented in html in doc/spl/operator/code-generation-api/perl– Also support for generation of C++ code from the expression tree– Use documented in the Toolkit Developer’s Reference

Page 37: SPL Enhancements  InfoSphere Streams Version 3.0

© 2012 IBM Corporation37

Problem

I have a Streams application that Imports from, or Exports to, another streams application

I would like to dynamically update the Export properties or the Import subscription

Page 38: SPL Enhancements  InfoSphere Streams Version 3.0

© 2012 IBM Corporation38

Export Property/Import Subscription Update from SPL code

Allows SPL programs to query/update properties/ subscriptions without having to use primitive operators.– getOutputPortExportProperties– setOutputPortExportProperties– getInputPortImportSubscription– setInputPortImportSubscription

Port must come from Import or go to Export operator that uses subscription/property

Triggers a disconnect/reconnect Use:setInputPortImportSubscription(‘stock == "IBM“’, 0u);

Page 39: SPL Enhancements  InfoSphere Streams Version 3.0

© 2012 IBM Corporation39

Problem

I provide a toolkit that is used in various countries. I would like to be able to load strings in a language appropriate for the locale in effect where my toolkit is used.

Within those strings I would like numeric values, for example, to be formatted in a locale sensitive way.

Page 40: SPL Enhancements  InfoSphere Streams Version 3.0

© 2012 IBM Corporation40

Localization Support

Utilizes ICU under the covers Resource bundle creation locale sensitive loading mechanism Translatable strings contained in XLIFF files

– XML Localization Interchange File Format Specify .xlf files in info.xml file Resource bundles built during toolkit indexing C++ header and Perl module generated Standard library functions added to load resource

– loadAndFormatResource Localized strings available at compile-time and run-time Localization sample program Documented in the Toolkit Developement Reference

Page 41: SPL Enhancements  InfoSphere Streams Version 3.0

© 2012 IBM Corporation41

XLIFF File

<xliff version="1.1" xmlns="urn:oasis:names:tc:xliff:document:1.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:oasis:names:tc:xliff:document:1.1 _path_/xliff-core-1.1.xsd"> <file datatype="plaintext" original="root.txt" source-language="en" target-language="en" xml:space="preserve"> <body> <group> <trans-unit id="1" extraData="MESSAGE_1" resname="MSG0001"> <source>English: A message emitted at compile time.</source> </trans-unit> <trans-unit id="2" extraData="MESSAGE_2" resname="MSG0002"> <source>English: A message emitted at run time. A formatted value ''{0,number,currency}''.</source> </trans-unit> </group> </body> </file></xliff>

Page 42: SPL Enhancements  InfoSphere Streams Version 3.0

© 2012 IBM Corporation42

Usage:

<%

# Add a require for the Perl module that contains the subroutine that loads and formats the string.

require MyResource;

# Emit the message using a SPL helper method

SPL::CodeGen::println(MyResource::MESSAGE_1());

%>

// Add an include for the header that contains the macro which loads and formats the message

#include "MyResource.h“

// Tuple processing for non-mutating ports

void MY_OPERATOR::process(Tuple const & tuple, uint32_t port)

{

const IPort0Type & t = static_cast<const IPort0Type &>(tuple);

// Get the loaded and formatted message and initialize the output tuple

SPL::rstring r = MESSAGE_2(t.get_i());

// Add a message to the runtime log

SPLAPPLOG(L_INFO, r, "test");

Page 43: SPL Enhancements  InfoSphere Streams Version 3.0

© 2012 IBM Corporation43

Standard Toolkit Changes

Page 44: SPL Enhancements  InfoSphere Streams Version 3.0

© 2012 IBM Corporation44

Problem

I find the Beacon operator somewhat limited in how I can use it as a stream generator. Is there another way of generating tuples?

Page 45: SPL Enhancements  InfoSphere Streams Version 3.0

© 2012 IBM Corporation45

Custom Operator as a Source

A Custom operator with no inputs can act as a source(stream<int32 a> A; stream<int32 a> B)=Custom() {

logic state : mutable int32 i = 9; onProcess : { for (int32 x in range(10)) { submit ({a = x + i}, A); submit ({a = 6 + x + i}, B); i++; } } }

onProcess clause added– Only allowed in a Custom operator– Only allowed if there are no input ports

Page 46: SPL Enhancements  InfoSphere Streams Version 3.0

© 2012 IBM Corporation46

Problem

My Streams application imports data from another application, but I am only interested in part of the data. I have to add a Functor to filter out a lot of the imported data. That wastes a lot of transport time.

Page 47: SPL Enhancements  InfoSphere Streams Version 3.0

© 2012 IBM Corporation47

Filter Support for Import

A new filter param added to the Import operator– boolean or rstring expression

type streamT = int64 value, rstring str, int32 x; stream<streamT> I = Import () { param subscription : “a >= 55”;

filter : value < 0 && str == “foo”; }

Filtering will be performed at Export operator and only matching tuples will be seen at the Import operator

Page 48: SPL Enhancements  InfoSphere Streams Version 3.0

© 2012 IBM Corporation48

Filter Support for Import

Filter expressions are the same as subscription expressions: – int64, rstring, float64, lists of same

Export operator has a new parameter:– allowFilter: true/false;

If allowFilter is false, an Import with a filter parameter will not connect to the Export operator

New metric added to PE output port (per connection): – TuplesFilteredOut– Shows number of tuples not sent over connection

New functions to query/update filter expression– getInputPortImportFilterExpression– setInputPortImportFilterExpression– Update is asyncronous

Page 49: SPL Enhancements  InfoSphere Streams Version 3.0

© 2012 IBM Corporation49

Problem

I have noticed, when using windows in my primitive operator, that they cache every tuple within the library.

It occurs to me that, at least with tumbling windows, it shouldn’t be necessary to cache the tuples.

Page 50: SPL Enhancements  InfoSphere Streams Version 3.0

© 2012 IBM Corporation50

Window Library Extended to Optimize Tumbling Windows

In Streams Version 2.0 the window library cached all tuples– Can use a lot of memory

In many cases it is not necessary to cache all the tuples– ie: Compute the average of attribute price in a tumbling window

• Requires only a count and a running total In Version 3.0 the window library is extended with

Summarizers– Aggregate operator updated to use this optimization

Page 51: SPL Enhancements  InfoSphere Streams Version 3.0

© 2012 IBM Corporation51

Problem

The Aggregate operator is very useful, however, if I want my own type of aggregation, I must re-write the whole operator.

Page 52: SPL Enhancements  InfoSphere Streams Version 3.0

© 2012 IBM Corporation52

User Extension of the Aggregate Operator

Version 3.0 adds a new aggregation function called “Custom”–Allows you to implement arbitrary aggregation functions without duplicating Aggregate operator

Requires mutable state tupleTakes three functions as arguments

–init function•Takes state, returns boolean

–process function•Takes attribute and state, returns boolean

–result function•Takes state, returns attribute type

This is an operator change and not a language change–Nested calls must return a value

Page 53: SPL Enhancements  InfoSphere Streams Version 3.0

© 2012 IBM Corporation53

Example

type AvgContext = float64 sum, int32 count;

...

stream<float64 avg > B = Aggregate (Src) {

logic state : mutable AvgContext avgContext;

window Src : tumbling, count(3);

output B : avg = Custom(myInit(avgContext), process(price, avgContext), result(avgContext));

}

Page 54: SPL Enhancements  InfoSphere Streams Version 3.0

© 2012 IBM Corporation54

Example

type AvgContext = float64 sum, int32 count;

boolean myInit (mutable AvgContext c) {

c.sum = 0.0; c.count = 0; return false;

}

boolean process (float64 value, mutable AvgContext c) {

c.count++; c.sum += value; return true;

}

float64 result (AvgContext c) {

if (c.count == 0) return 0.0;

return c.sum/(float64)c.count;

}

Page 55: SPL Enhancements  InfoSphere Streams Version 3.0

© 2012 IBM Corporation55

More on Custom

Can be used more than once –Each unique aggregation needs its own functions and state

•Can re-use functions with unique stateCan be used with partitionBy/groupBy

–Does not use summarizers for tumbling windows (stores tuples)•Only one state, no partition/group information exchanged•User maintains state

Can be interleaved with pre-defined aggregation functions–output O : ticker=Any(ticker), myAvg=Custom(..process(price, ...), ...);

Page 56: SPL Enhancements  InfoSphere Streams Version 3.0

© 2012 IBM Corporation56

Problem

I would really like to augment the output tuple of a ???Source operator with attribute data that doesn’t come from the input file, socket, whatever.

Page 57: SPL Enhancements  InfoSphere Streams Version 3.0

© 2012 IBM Corporation57

Enhancements to FileSource/UDPSource/TCPSource

Source operators can now have an output clause New Custom Output Functions added:

– FileSouce: TupleNumber(), FileName()– TCPSource: TupleNumber(), RemoteIP(), RemotePort(), LocalPort(),

ServerPort()– UDPSource: TupleNumber(), RemoteIP(), RemotePort(), ServerPort()

Use:– output A: fName = FileName(), value= someFcn();

All other output stream attributes filled from data source as usual

Page 58: SPL Enhancements  InfoSphere Streams Version 3.0

© 2012 IBM Corporation58

Problem

I have attributes in my stream that I don’t want my ???Sink to write out.

What I’d really like to do is use some of those attributes to, say, help generate a unique file name that could be used by the FileSink.

Page 59: SPL Enhancements  InfoSphere Streams Version 3.0

© 2012 IBM Corporation59

Enhancements to FileSink

suppress : inputAttributes;– named attributes will not be written to file

Add closeMode : dynamic– file parameter expression allowed to reference input attributes– file expression is evaluated for each tuple, and if it changes, the

existing file will be closed and a new file opened based on the value New filename formatting specifiers

– {localtime:strftimeString}– {gmtime:strftimeString}

Page 60: SPL Enhancements  InfoSphere Streams Version 3.0

© 2012 IBM Corporation60

Enhancements to FileSink

param file : “{localtime:%m.%d}_” + CompanyName + “.txt”;– This would generate a filename with %m being the current month

number, %d the current day in the month, an underscore (_), the value of the CompanyName input attribute, and then “.txt”

– Each time the month, day or company name changes a new filename would be used

– Useful with the append param

Page 61: SPL Enhancements  InfoSphere Streams Version 3.0

© 2012 IBM Corporation61

Enhancements to TCPSink

closeMode and suppress parameters added– never – current behaviour– dynamic - to connect based on host/port in tuple

type FinalResult=tuple<rstring host,uint32 port,uint32 value>;graph () as TCPSink1 = TCPSink(A) {

param closeMode: dynamic;suppress: host, port;

address : host; port : port; role : client; retryFailedSends : true; }

Need role : client, closeMode: dynamic , retryFailedSends : true address and port can use runtime expressions TCPSink will close old connection and re-open new one if host or port

changes

Page 62: SPL Enhancements  InfoSphere Streams Version 3.0

© 2012 IBM Corporation62

Enhancements to FileSink

Improved write error detection/handling New parameter: writeFailureAction:

– Optional; values are ignore (the default), log, terminate.   – ignore: do nothing, and all future writes will fail.– log: the error is logged and the error condition is cleared.  Further writes may

again cause failures, if the underlying cause is not corrected.  Even if the underlying cause is corrected, there will be gaps in the file due to the failed writes.

– terminate: an error is logged, and the operator will terminate. Closing a file (closeMode) will reset the error.  Future writes should

succeed if the underlying problem has been corrected.

Page 63: SPL Enhancements  InfoSphere Streams Version 3.0

© 2012 IBM Corporation63

Problem

I would like to have greater control over what my FileSink operator does when it encounters a write error

Page 64: SPL Enhancements  InfoSphere Streams Version 3.0

© 2012 IBM Corporation64

Enhancements to FileSink

new metric: nTupleWriteErrors – Number of tuple writes (not tuples) that had an error on the file stream

after the write completed.– Due to buffering, write failure may not be detected immediately.  Use

param flush : 1u; to ensure quicker detection, but with a (possibly large) performance penalty

– Once failure is detected, all future writes will fail unless the error condition is cleared.

Page 65: SPL Enhancements  InfoSphere Streams Version 3.0

© 2012 IBM Corporation65

Problem

I have some data that I would like to ingest, but it is an a compression form that is not supported by the existing operators.

I don’t want to have to write a whole Source operator in order to support it.

Page 66: SPL Enhancements  InfoSphere Streams Version 3.0

© 2012 IBM Corporation66

New Utility Operators

A new set of operators has been added allowing functionality of Standard Source/Sink operators to be extended– Parse: accept blob (ie: csv), generate tuple– Format: accept tuple, generate blob (ie: csv)– Decompress: decompress data in blob (gzip), generate blob– Compress: compress data in blob, generate blob (gzip)– CharacterTransform: convert from one encoding in blob to another

encoding in blob Allows supporting more data formats without having to write a

complete Source/Sink operator

Page 67: SPL Enhancements  InfoSphere Streams Version 3.0

© 2012 IBM Corporation67

New Utility Operatorsstream<…> Tuples = XXXSource () {

param compression: gzip; encoding: “ISO_8859-9”; format: csv;

}

This is equivalent to:

stream<blob data> Input = XXXSource() { …param format : block; blockSize : 4096u; }

stream<Input> Uncompressed = Decompress(Input) {param compression : gzip;}

stream<Input> Decoded = CharacterTransform (Uncompressed) {output Decoded : data = Convert (“ISO_8859-9”, “UTF8”,

data); }

stream<…> Tuples = Parse(Decoded) { … param format : csv; … }

Page 68: SPL Enhancements  InfoSphere Streams Version 3.0

© 2012 IBM Corporation68

New Utility Operators

stream<blob data> Formatted = Format(someStream) { param format : csv;

}

stream<Formatted> Encoded = CharacterTransform(Formatted) {output Encoded : data =

Convert (“UTF8”, “ISO_8859-9”, data); }

stream<Formatted> Compressed = Compress(Encoded) {param compression : gzip;

}

() as Nul = XXXSink(Compressed) {param format : block; blockSize : 4096u;

}

Page 69: SPL Enhancements  InfoSphere Streams Version 3.0

© 2012 IBM Corporation69

Problem

I have an operator which has a table I need to initialize at startup. I’d like to ensure that no tuples will flow to my operator prior to its initialization

Page 70: SPL Enhancements  InfoSphere Streams Version 3.0

© 2012 IBM Corporation70

Switch Operator

The Switch operator acts like an open/closed switch. When open, tuples will wait until the switch closes A control port will set the switch open or closed Useful to allow an operator to finish initialization before

processing a stream After initialized, send a tuple to Switch to allow tuples to flow

Page 71: SPL Enhancements  InfoSphere Streams Version 3.0

© 2012 IBM Corporation71

Problem

I would like some way to ensure, when making a TCP connection using a TCPSource/TCPSink, that the format of the data being sent is compatible with what the Source is expecting.

Page 72: SPL Enhancements  InfoSphere Streams Version 3.0

© 2012 IBM Corporation72

Stream Schema Checking for TCPSource/TCPSink

param confirmWireFormat : true/false;– Default is false

If true, TCP server role will send information about data to be sent– Tuple schema, format, compression, encoding, hasDelay, contains

punctuation TCP client role will determine if the information is compatible

– Returns go/nogo status and optional message If not compatible, connection closed

Page 73: SPL Enhancements  InfoSphere Streams Version 3.0

© 2012 IBM Corporation73

Problem

When you release a new version of the product, you never tell me about the little things you added

Page 74: SPL Enhancements  InfoSphere Streams Version 3.0

© 2012 IBM Corporation74

Miscellaneous Changes

Import: subscription and filter support mod (%)– identifier % int64Lit compareOp int64Lit– param subscription : anId % 10ll >= 8ll;

Add param writePunctuations: boolean to *Sink with format bin to write punctuations to output

param readPunctuations with format bin for *Source will read punctuations from stream and submit downstream

Optional second parameter to DeDuplicate that generates the tuples that are duplicates